142 views
 owned this note
# Summary of Test-Beam Data Conversion (Draft) Last Update: Feb. 21, 2023, Yuzhi Che Waiting for supplement ... ## Introduction The taskforce of CERN Test-Beam data format conversion aims to convert the raw testbeam data (of CEPC Sc-ECAL and AHCAL prototype, collected at CERN) to the CEPC framework so that benifit the future data analysis. This work began from Nov. 30, 2022 [[Kickoff Meeting]](https://indico.ihep.ac.cn/event/18428/), and till the February in 2023, the first iteration result was basically completed. The work of data conversion can be divided into two branches, as shown in the following plot, namely Fast conversion and Full conversion. The first branch, Fast conversion, means to provide a brief version of result and surpport the beginning of high level analysis which relies on the CEPC software. Three intermediat stages are included in this work. Firstly, the binary data is decoded to ROOT files. Through a brief calibration process, the calibrated data with ROOT formate is generated. Finally, the "ROOT to LCIO" program create the LCIO files according to the Hit information (x,y,z,E,id) in the calibrated ROOT files. The second branch of work, Full conversion, is to provide a integrated software that directly and completely copy the content of the binary data into ROOT/LCIO files. All of the products in the processes are saved in the IHEP server. ![Map of Data Conversion](https://note.ihep.ac.cn/uploads/0e207af4-eca4-4dd0-9fee-d0f2c4f83c5a.png) This draft records the location of the revalved data files, documents, conventions and detials in the data conversion work, as a reference for the future iteration and analysis. ## Binary Raw Data ### Public Data Space: Administrator: Gang Li ```bash /cefs/data/TestBeam ``` ### File List (orgnized by Yukun Shi): ```bash /cefs/data/TestBeam/Calo_Oct.2022/FileList_2022BeamTest_more.pdf ``` ## Data Decoding ### Binary Data Structure https://mattermost.ihep.ac.cn/ihep/pl/1ts81a1xabyw9bofs7gx5ecmoo ### ROOT Data Structure https://mattermost.ihep.ac.cn/ihep/pl/zrqb5tfeh3rxfe1jzjeeengmaw ### Decoder | Decoder Name | Legacy/Original | RawdataDecoder | | -------- | -------- | -------- | | Author | Sc-ECAL: Jiaxuan Wang | François | | | AHCAL: Yukun Shi | | | Source Code | Sc-ECAL: lxslc7.ihep.ac.cn:/cefs/higgs/wangjx/ScECAL/Diagnose | [GitHub Link](https://github.com/flagarde/RawdataDecoder) | | | AHCAL: /cefs/higgs/shiyk/Beam_2022/Decode/HBUAna_Cherenkov | | | Input Formate | Raw Binary Data | Raw Binary Data | | Output Formate | ROOT File | ROOT & LCIO | | Ouput Path | Sc-ECAL: lxslc7.ihep.ac.cn:/cefs/higgs/wangjx/ScECAL/Result_Diagnose/decode/ | lxslc7.ihep.ac.cn:/scratchfs/cepc/lagarde/new_decoder| | | AHCAL: lxslc7.ihep.ac.cn:/cefs/higgs/shiyk/Beam_2022/DataBase/RawRoot/ | | ### Conventions & Discussion #### Treatment of abnormal packages - **Empty packages**: remove in the both decoders - ***Split* packages with TriggerID same to before**: remove in **Legacy/Original** decoder while resave in **RawdataDecoder**. - **Packages including abnormal chipID**: ... #### Cell-ID encoding In **Legacy/Original** decoder, `CellID` was encoded as $$ \mathrm{ID_{cell}} = \mathrm{ID_{layer}}\cdot 10^{5} + \mathrm{ID_{chip}}\cdot 10^{4} + \mathrm{ID_{memo}} \cdot 10^{2} + \mathrm{ID_{channel}} $$ Accordingly, the $\mathrm{ID_{layer}}$, $\mathrm{ID_{chip}}$, $\mathrm{ID_{channel}}$ can be decoded by: ```c-like= layer = int(hit->getCellID0() / 100000); chip = int(hit->getCellID0() % 100000 / 1e4); channel = int(hit->getCellID0() % 100); ``` **It has been discovered that chip_id could go above 10 and so need more bits in CellID to be stored** In **RawdataDecoder** a dedicated class (CellID.h CellID.cc) used for both decoders (**ROOT/LCIO**) has been developped. User can copy-paste this class to simplify the decoding in their own analysis. In **RawdataDecoder**, the cell id encoding in **ROOT files** follows: CellID is a 32bits split in 4*8bits with the values : ```c-like= // RawdataDecoder/libs/interfaces/ROOTWriter/src/ROOTWriter.cc m_cellID = (layer << 24) + (chip << 16) + (memory << 8) + channel; ``` It is easier to use the CellID class : ```c-like= CellID myCellID; myCellID(layer,chip,memory,channel); std::cout<<"CellID is : "<<myCellID.getCellID()<<std::endl; std::cout<<"I don't need to worry about decoding by myself :" <<" layer "<<myCellID.getLayerID() <<" chip_id "<<myCellID.getChipID() <<" memory : "<<myCellID.getMemory() << " channel "<<myCellID.getChannel()<<std::endl; ``` while in **LCIO** output: For LCIO it uses the UTIL::CellIDEncoder of LCIO, You can access the string to setup this CellIDEncoder calling CellID::getCellIDEncoderString(); This encoding should be saved in file so it would be easy to acces the values; ```c-like= // RawdataDecoder/libs/interfaces/LCIOWriter/src/LCIOWriter.cc void LCIOWriter::processCell(const Data& d, const std::uint32_t& chip, const std::uint32_t& channel) { UTIL::CellIDEncoder<IMPL::RawCalorimeterHitImpl> cd(CellID::getCellIDEncoderString(), m_CollectionVec); m_LCEvent->setTimeStamp(m_LCEvent->getTimeStamp() + d.getTriggerID()); IMPL::RawCalorimeterHitImpl* hit = new IMPL::RawCalorimeterHitImpl; for(std::size_t memory = 0; memory != d.getChip(chip).getNumberColumns(); ++memory) { if(d.getChip(chip).getID() >= 10) log()->error("Chip_id >=10 ({}) : Layer {} Chip {} memory {} channel {}", d.getChip(chip).getID(), d.getLayer(), d.getChip(chip).getID(), memory, channel); cd["BCID"] = d.getChip(chip).getBCIDs(memory); cd["gain"] = d.getChip(chip).getCharge(memory, channel).gain(); cd["hit"] = d.getChip(chip).getCharge(memory, channel).hit(); cd["layer"] = d.getLayer(); cd["chip"] = d.getChip(chip).getID(); cd["channel"] = channel; cd["memory"] = memory; cd.setCellID(hit); hit->setAmplitude(d.getChip(chip).getCharge(memory, channel).charge()); hit->setTimeStamp(d.getChip(chip).getTime(memory, channel).timestamp()); if(static_cast<DetectorID>(d.getDetectorID()) == DetectorID::ECAL) m_LCEvent->parameters().setValue("Cherenkov", -1); } m_CollectionVec->addElement(hit); } ``` To read index of layer, chip and channel, the class `CellIDDecoder` should be implemented: ```c-like= CellIDDecoder<RawCalorimeterHit> idDecoder(col); layer = idDecoder(hit)["layer"]; channel = idDecoder(hit)["chip"] - 1; // to keep consistant with legacy decoder chip = idDecoder(hit)["channel"]; ``` ***NOTE***: Until 2023-02-21, the channel ID in LCIO ouput of **RawdataDecoder** should be subtracted by one to keep consistant with **Legacy/Original** decoder. #### ID-Position Map Here two ID-Position Maps for the Sc-ECAL and AHCAL are listed in form of two functions (**which should be validated by the experts**): ```c-like= //------------------ For ScECAL ------------------ // Copy from Jiaxuan Wang @ USTC // scintillator strips wrt. 6 SPIROC2E chips * 36 channels TVector3 GetScEcalHitPos(int LayerIDs, int ChipIDs, int ChannelIDs) { const int chipNu = 6; const int chnNu = 36; int decodeID[chipNu][chnNu] = { 0, 42, 1, 43, 2, 44, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 54, 13, 55, 14, 56, 15, 57, 16, 58, 17, 59, 18, 60, 19, 61, 20, 62, 21, 22, 23, 24, 66, 25, 67, 26, 68, 27, 69, 28, 70, 29, 71, 30, 72, 31, 73, 32, 74, 33, 75, 34, 76, 35, 77, 36, 78, 37, 79, 38, 80, 39, 81, 40, 82, 41, 83, 149, 148, 147, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 63, 64, 65, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 150, 192, 151, 193, 152, 194, 153, 195, 154, 196, 155, 197, 156, 198, 157, 199, 158, 200, 159, 201, 160, 202, 161, 203, 162, 204, 163, 205, 164, 206, 165, 207, 166, 208, 167, 209, 191, 190, 189, 188, 146, 187, 145, 186, 144, 185, 143, 184, 142, 183, 141, 182, 140, 181, 139, 180, 138, 179, 178, 177, 176, 175, 174, 173, 172, 171, 170, 128, 169, 127, 168, 126, 137, 136, 135, 134, 133, 132, 131, 130, 129, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 45, 46, 47, 48, 49, 50, 51, 52, 53, 210, 210, 210, 210, 210, 210}; int ScintillatorIDs = decodeID[ChipIDs][ChannelIDs]; double layerZ; const double _xInterval = 5.3; // 300 um gap in width direction const double _yInterval = 45.4; // 400 um gap in length direction const int rowNu = 42; const int columnNu = 5; int _yID = ScintillatorIDs / rowNu; int _xID = ScintillatorIDs % rowNu; TVector3 _position; double x0 = _xInterval * _xID - _xInterval * (rowNu - 1) / 2.; double y0 = _yInterval * _yID - _yInterval * (columnNu - 1) / 2.; // for prototype test if (LayerIDs % 2 == 0) { _position[0] = -y0; _position[1] = -x0; } if (LayerIDs % 2 == 1) { _position[0] = -x0; _position[1] = -y0; } if (LayerIDs % 2 == 0) layerZ = 1 + LayerIDs / 2 * 19.9; // else layerZ = 12.95+(LayerIDs-1)/2*19.9; else layerZ = 12.2 + (LayerIDs - 1) / 2 * 19.9; _position[2] = layerZ; return _position; } ``` ```c-like= //------------------ For AHCAL ------------------ // Copy from Yukun @ USTC // scintillator strips wrt. 6 SPIROC2E chips * 36 channels TVector3 GetAhcalHitPos(int layer_ID, int chip_ID, int channel_ID) { const int cell_SP = 16; const int chip_No = 9; const int channel_No = 36; const int Layer_No = 40; const double _Pos_X[channel_No] = { 100.2411, 100.2411, 100.2411, 59.94146, 59.94146, 59.94146, 19.64182, 19.64182, 19.64182, 19.64182, 59.94146, 100.2411, 100.2411, 59.94146, 19.64182, 100.2411, 59.94146, 19.64182, -20.65782, -60.95746, -101.2571, -20.65782, -60.95746, -101.2571, -101.2571, -60.95746, -20.65782, -20.65782, -20.65782, -20.65782, -60.95746, -60.95746, -60.95746, -101.2571, -101.2571, -101.2571}; const double _Pos_Y[channel_No] = { 141.04874, 181.34838, 221.64802, 141.04874, 181.34838, 221.64802, 141.04874, 181.34838, 221.64802, 261.94766, 261.94766, 261.94766, 302.2473, 302.2473, 302.2473, 342.54694, 342.54694, 342.54694, 342.54694, 342.54694, 342.54694, 302.2473, 302.2473, 302.2473, 261.94766, 261.94766, 261.94766, 221.64802, 181.34838, 141.04874, 221.64802, 181.34838, 141.04874, 221.64802, 181.34838, 141.04874}; const double chip_dis_X = 239.3; const double chip_dis_Y = 241.8; const double HBU_X = 239.3; const double HBU_Y = 725.4; const double HBU_Z = 26; TVector3 pos; int HBU_ID = chip_ID / 3; chip_ID = chip_ID % 3; pos.SetX(_Pos_Y[channel_ID] - chip_ID * chip_dis_Y); pos.SetY(-(-_Pos_X[channel_ID] + (HBU_ID - 1) * HBU_X)); pos.SetZ(layer_ID * HBU_Z); return pos; } ``` #### Cross-check between the two decoders The output of the above two decoders were comparised from the scope of the hit ADC spectrum and hit layer/chip/channel distribution, see [[Indico Page]](https://indico.ihep.ac.cn/event/18956/contributions/128810/attachments/66896/79188/report_0216_full_conv.pdf) The comparison scripts were integrated into `RawdataDecoder` and the exacutable file names `Comparison`. ```bash -bash-4.2$ ./Comparaison --help Compare between rawdata programs Usage: ./Comparaison [OPTIONS] Options: -h,--help Print this help message and exit --version Display program version information and exit --original TEXT:FILE REQUIRED original file --new TEXT:FILE REQUIRED new file --output_path TEXT [.] (Env:STREAMOUT_OUTPUT_PATH) Output path --pdf_name TEXT [result] Name of the pdf generated without .pdf ``` ## Data Calibration & Fast Conversion ### List of tools | Tool Name | Description | Authors/Maintainers | Input Alias/Tag | Output Alias/Tag | | -------- | -------- | -------- | ------ | ------ | | **Legacy/Original** decoder | identical with **Legacy/Original** decoder, which converts the binary data files to ROOT files. |Yukun Shi, Jiaxuan Wang| Binary raw data | Raw ROOT | | **Calibration** | A set of currently preliminary and continuously updated calibration programs | Yukun Shi, Jiaxuan Wang | Raw ROOT | Calib ROOT | | **TbDataConvert** | A program that reads calibrated ROOT files and builds LCIO files including basic hit information such as event number, hit energy, position, cell id. | Hengyu Wang, Yuzhi Che | Calib ROOT | Fast LCIO | | **Druid (for TB)** | A branch version of LCIO display software, configerated to display Test-Beam data in LCIO formate. | Yuzhi Che | Fast/Full LCIO | | | | | Zhen Wang | ROOT | | | **PyShow** | | Siyuan Song | ROOT | | > The *Fast LCIO* denotes the LCIO files converted from calibrated ROOT files and only include basic hit information. The *Full LCIO* denotes the LCIO files converted from the binary data files directly and contains all of raw information. ### Software Access - **Legacy/Original**: See before. - Calibration: - Sc-ECAL: /cefs/higgs/wangjx/ScECAL/Diagnose - AHCAL: /cefs/higgs/shiyk/Beam_2022/Decode/HBUAna_Cherenkov - TbDataConvert: [GitLab Link: https://code.ihep.ac.cn/wanghengyu/tbdataconvert](https://code.ihep.ac.cn/wanghengyu/tbdataconvert) - Druid (for TB): - [Public page of original version: http://cepcsoft.ihep.ac.cn/guides/EventDisplay/docs/druid/](http://cepcsoft.ihep.ac.cn/guides/EventDisplay/docs/druid/) - [GitLab Link: https://code.ihep.ac.cn/cheyuzhi/druid](https://code.ihep.ac.cn/cheyuzhi/druid) (The sub-version dedicated for TB data is at branch `test-beam`) - Zhen's Display: ... - PyShow: ... ### Data Sets - ROOT Raw data: - Sc-ECAL - ```lxslc7.ihep.ac.cn:/cefs/higgs/wangjx/ScECAL/Result_Diagnose/decode/``` - AHCAL (Yukun's decoder) - ```lxslc7.ihep.ac.cn:/cefs/higgs/shiyk/Beam_2022/DataBase/RawRoot/``` - AHCAL (Francois' decoder) - ```/scratchfs/cepc/lagarde/new_decoder ``` - ROOT Calibrated data: - Sc-ECAL: /cefs/higgs/wangjx/ScECAL/Result_Diagnose/calib - AHCAL: /cefs/higgs/shiyk/Beam_2022/DataBase/Calib - LCIO Fast Conversion data: - Sc-ECAL: /cefs/higgs/wanghengyu/cepc/Root2SLCIO/work/data/fast_lcio/ecal - AHCAL: To be converted ... ### Validation of Fast LCIO files The Fast LCIO files, output of `TbDataConvert`, are compared with the Calib ROOT files, using a few runs data. The comparison of the hit energy spectra, number of hits per event, hit layer distribution and hit map showed good consistance between the two file formates, see and [[Indico Page on Dec. 22, 2022]](https://indico.ihep.ac.cn/event/18542/contributions/122782/attachments/65877/77868/1222_cross_check.pdf) and [[Indico Page on Feb. 2, 2023]](https://indico.ihep.ac.cn/event/18762/contributions/127070/attachments/66578/78713/report_0202.pdf).