Offline Discussion
description
Transcript of Offline Discussion
Offline Discussion
M. Moulson22 October 2004
• Datarec status• Reprocessing plans• MC status• MC development plans• Linux• Operational issues• Priorities• AFS/disk space
2
Datarec DBV-20DC geometry updated
Global shift: y = 550 μm, z = 1080 μmImplemented in datarec for Run > 28000Thickness of DC wall not changed (75 μm)
Modifications to DC timing calibrationsIndependence from EmC timing calibrations
Modifications to event classification (EvCl)New KSTAG algorithm (KS tagged by vertex in DC)Bunch spacing by run number in T0_FIND step 1 for ksl
2.715 ns for 2004 data (also for MC, some 2000 runs)
Boost valuesRuns not reconstructed without BMOM v.3 in HepDBpx values from BMOM(3) now used in all EvCl routines
Run > 31690
3
Datarec operations
Runs 28479 (29 Apr) to 32380 (21 Oct, 00:00)413 pb-1 to disk with tag OK394 pb-1 with tag = 100 (no problems)388 pb-1 with full calibrations371 pb-1 reconstructed (96%) 247 pb-1 DSTs (except KK)
fsun03-fsun10 decommissioned 11 OctNecessary for installation of new tape librarydatarec submission moved from fsun03 to fibm35DST submission moved from fsun04 to fibm36
150 keV offset in s discovered!
4
150 keV offset in sDiscovered while investigating ~100 keV discrepancies between physmon and datarec
+150 keV adjustment to fit value of s not implemented• in physmon• in datarec• when final BVLAB s values written to HepDB
Plan of action:1. New Bhabha histogram for physmon fit, taken from data2. Sync datarec fit with physmon3. Fix BVLAB fit before final 2004 values computed4. Update 2001-2002 values in DB records
histogram_history and HepDB BMOM 2001-2002 currently from BVLAB scan, need to add 150 KeV
Update of HepDB technically difficult, need a solution
5
Reprocessing plans
Issues of compatibility with MC• DC geometry, T0_FIND modifications by run number• DC timing modifications do not impact MC chain• Additions to event classification would require new
MCDSTs onlyIn principle possible to use run number range to fix px values for backwards compatibility
Use batch queues?Main advantage: Increased stability
6
Further datarec modifications
Modification of inner DC wall thickness (75 μm)Implement by run number
Cut DC hits with drift times 2.5 μsSuggested by P. de Simone in May to reduce fraction of split tracks
Others?
7
ProgramEvents(106)
LSFTime
(B80 days)Size(TB)
ee 36 6 120 0.8
e+e (ISR only) 36 6 120 0.8
rad 114 5 480 1.7
ee ee 38 0.15 220 0.6
all 252 0.2 1100 6.9
all (21 pb-1 scan) 29 1 130 0.7
KSKL 411 1 2100 11.0
KK 611 1 2620 18.0
Total 1527 - 6890 40.5
KSKL rare 62 20* 320 est. 1.7 est.
MC production status
8
Generation of rare KSKL events
KS
3
KL
(DE)
Peak cross section: 7.5 nbApprox 2x sum of BRs for rare KL channels
In each event, either KS or KL decays to rare mode
Random selection
Scale factor of 20 applies to KL
For KS, scale factor is ~100
9
MC development plansBeam pipe geometry for 2004 data (Bloise)
LSB insertion code (Moulson)
Fix generator (Nguyen, Bini)
Improve MC-data consistency on tracking resolution (Spadaro, others)
• MC has better core resolution and smaller tails than data in Emiss pmiss distribution in background for KSe analysis
• Improving agreement would greatly help for precision studies involving signal fits, spectra, etc.
• Need to systematically look at other topologies/ variables
• Need more people involved
10
Linux software for KLOE analysis
P. Valente had completed an earlier port based on free software
VAST F90-to-C preprocessorClunky to build and maintain
M. Matsyuk has completed a KLOE port based on the Intel Fortran compiler for Linux
Individual, non-commercial license is freelibkcp code compiles with zero difficulty
Reconsider issues related to maintenance of KLOE software for Linux
11
Linux usage in KLOE analysisMost users currently processing YBOS DSTs into Ntuples on farm machines and transferring Ntuples to PCs
• AFS does not handle random-access data welli.e., writing CWNs as analysis output
• Multiple jobs on a single farm node stress AFS cache• Farm CPU (somewhat) limited• AFS disk space perennially at a premium
KLOE software needs minimal for most analysis jobs• YBOS to Ntuple: No DC reconstruction, etc.
Analysis jobs on user PCs accessing DSTs via KID and writing Ntuples locally should be quite fast
Continuing interest on part of remote users
12
KLOE software on Linux: Issues1.Linux machines at LNF for hosting/compilation
3 of 4 Linux machines in Computer Center are down, including klinux (mounts /kloe/soft, used by P. Valente for VAST build)
2.KLOE code distributionUser PCs do not mount /kloe/softMove /kloe/soft to network-accessible storage?Use CVS for distribution?
Elegant solution but user must periodically update…
3. Individual users must install Intel compiler
4. KIDHas been built for Linux in the past
5. Priority/manpower
13
Operational issues
Offline expert training1-2 day training course for all expertsGeneral update
PC backup systemCommercial tape backup system available to users to backup individual PCs
14
Priorities and deadlines
In order of priority, for discussion:
1. Complete MC production: KSKL rare
2. Reprocessing
3. MC diagnostic work
4. Other MC development work for 2004
5. Linux
Deadlines?
15
Disk resources
Current recalled areas
Production 0.7 TB
User recalls 2.1 TB
DST cache12.9 TB
(10.2 TB added in April)
2001 – 2002
Total DSTs7.4 TB
Total MCDSTs7.0 TB
2004
DST volume scales with L3.2 TB added to AFS cell
Not yet assigned to analysis groups
2.0 TB available but not yet installedReserved for testing new network-accessible storage solutions
16
Limitations of AFSInitial problems with random-access files blocking AFS on farm machines resolved
Nevertheless, AFS has some intrinsic limitations:
Volume sizes at most 100 GB• Already pushed to the limit – max spec is 8 GB!
Cache must be much larger than AFS-directed data volume for all jobs on farm machine
• Problem characteristic of random-access files (CWNs)• Current cache sizes 3.5 GB on each farm machine
More than sufficient for a single jobPossible problems with 4 big jobs/machine
• Enlarging cache sizes requires purchase of more local disk for farm machines
17
Network storage: Future solutionsPossible alternatives to AFS
1.NFS v. 4• kerberos authentication – use klog as with AFS• Size of data transfers smaller, expect fewer problems with random-access files
2.Storage Area Network (SAN) filesystem• Currently under consideration as a Grid solution• Works only with Fibre Channel (FC) interfaces• FC – SCSI/IP interface implemented in hardware/software
Availability expected in 2005
Migration away from AFS probable within ~6 months2 TB allocated to tests of new network storage solutions Current AFS system will remain interim solution
18
Current AFS allocations
VolumesSpace (GB)
Working group
cpwrk 195 Neutral K
kaon 170 Neutral K
kwrk 200 Charged K
phidec 400 Radiative
ecl 149
mc 90
recwrk 30
trg 100
trk 90
365
200400
19
A fair proposal?
Each of the 3 physics WGs gets 1400 GB total
Total disk space (incl. already installed) divided equally
• Physics WGs similar in size and diversity of analyses• WGs can make intelligent use of space
e.g.: Some degree of Ntuple sharing already present• Substantial increases for everyone anyway
20
Additional information
21
Offline CPU/disk resources for 2003
Available hardware:
23 IBM B80 servers: 92 CPU’s
10 Sun E450 servers: 18 B80 CPU-equivalents
6.5 TB NFS-mounted recall disk cache
Easy to reallocate between production and analysis
Allocation of resources in 2003:
64 to 76 CPU’s on IBM B80 servers for production
800 GB of disk cache for I/O staging
Remainder of resources open to users for analysis
22
Analysis environment for 2003
Production of histograms/Ntuples on analysis farm:
4 to 7 IBM B80 servers + 2 Sun E450 servers
DST’s latent on 5.7 TB recall disk cache
Output to 2.3 TB AFS cell accessed by user PC’s
Analysis example:
440M KSKL events, 1.4 TB DST’s
6 days elapsed for 6 simultaneous batch processes
Output on order of 10-100 GB
Final-stage analysis on user PC/Linux systems
23
CPU power requirements for 2004
020406080
100120
2001 2002 2004
0
20
40
60
80
100
120
140
160
2001 2002 2004
0
1
2
3
2001 2002 2004
Input rate (KHz)
Avg L (1030 cm2s1)
B80 CPU’s needed to follow acquisition
MC
DST
recon
76 CPU offline farm
24
CPU/disk upgrades for 2004
Additional servers for offline farm:
10 IBM p630 servers: 10×4 POWER4+ 1.45 GHz
Adds more than 80 B80 CPU equivalents to offline farm
Additional 20 TB disk space
To be added to DST cache and AFS cell
More resources already allocated to users
8 IBM B80 servers now available for analysis
Can maintain this allocation during 2004 data taking
Ordered, expected to be on-line by January
25
Installed tape storage capacity
IBM 3494 tape library:• 12 Magstar 3590 drives, 14 MB/s read/write • 60 GB/cartridge (upgraded from 40 GB this year)• 5200 cartridges (5400 slots)• Dual active accessors• Managed by Tivoli Storage Manager
Maximum capacity: 312 TB (5200 cartridges)
Currently in use: 185 TB
26
0
100
200
300
400
500
600
Tape storage requirements for 2004
0
50
100
raw recon DST MC
0
50
100
raw recon DST MC
Stored vol. by type (GB/pb1)
2002
2004 est.Incl. streaming mods
11898
16 43
57 4916 43
Today +780pb1
+1210pb1
+2000pb1
Tape library usage (TB)
fre
e
raw
recon
DST
MC
27
Tape storage for 2004
Additional IBM 3494 tape library• 6 Magstar 3592 drives: 300 GB/cartridge, 40 MB/s• Initially 1000 cartridges (300 TB)• Slots for 3600 cartridges (1080 TB)• Remotely accessed via FC/SAN interface• Definitive solution for KLOE storage needs
Bando di gara submitted to Gazzetta UfficialeReasonably expect 6 months to delivery
Current space sufficient for a few months of new data
28
Machine background filter for 2004Background filter (FILFO) last tuned on 1999-2000 data
5% inefficiency for events, varies with background levelMainly traceable to cut to eliminate degraded BhabhasRemoval of this cut: Reduces inefficiency to 1%
Increases stream volume 5-10%Increases CPU time 10-15%
New downscale policy for bias-study sample:Fraction of events not subject to veto, written to streams
Need to produce bias-study sample for 2001-2002 dataTo be implemented as reprocessing of a data subset with new downscale policyWill allow additional studies on FILFO efficiency and cuts
29
Other offline modifications for 2004Modifications to physics streaming:
Bhabha stream: keep only subset of radiative eventsReduces Bhabha stream volume by 4Reduces overall stream volume by >40%
KSKL stream: clean up choice of tags to retain
Reduces KSKL stream volume by 35%
KK stream: new tag using dE/dxFully incorporate dE/dx code into reconstructionEliminate older tags, will reduce stream volume
Random trigger as source of MC background for 200420 Hz of random triggers synched with beam crossing allows background simulation for L up to 21032 cm2s1
30
KLOE computing resources
tape library IBM 3494, 5400 60GB slots, 2 robots, TSM 324 TB 12 Magstar E1A drives, 14 MB/sec each
managed disk space 0.8 TB SSA: offline staging6.5 TB 2.2 TB SSA + 3.5 TB FC: latent disk cache
offline farm19 IBM B80 4×POWER3 375
8 Sun E450 4×UltraSPARC-II 400
AFS cell2 IBM H70 4×RS64-III 340
1.7 TB SSA + 0.5 TB FC disk
online farm7 IBM H50 4×PPC604e 332
1.4 TB SSA disk
analysis farm4 IBM B80 4×POWER3 375
2 Sun E450 4×UltraSPARC-II 400
file servers 2 IBM H80 6×RS64-III 500
DB2 serverIBM F50 4×PPC604e 166
CISCOCatalyst
6000
nfs nfs
nfs afs
100 Mbps 1 Gbps
31
2004 CPU estimate: details Extrapolated from 2002 data with some MC input
2002
L = 36 b1/s
T3 = 1560 Hz345 Hz + Bhabha680 Hz unvetoed CR535 Hz bkg
2004
L = 100 b1/s (assumed)
T3 = 2175 Hz960 Hz + Bhabha680 Hz unvetoed CR535 Hz bkg (assumed constant)
From MC: = 3.1 b (assumed)
+ Bhabha trigger: = 9.6 b + Bhabha FILFO: = 8.9 bCPU( + Bhabha) = 61 ms avg.
CPU time calculation:4.25 ms to process any event+ 13.6 ms for 60% of bkg evts+ 61 ms for 93% of + Bha evts
2002: 19.6 ms/evt overall – OK
2004: 31.3 ms/evt overall (10%)
32
2004 tape space estimate: details 2001: 274 GB/pb1
2002: 118 GB/pb1
Highly dependent on luminosity
2004: Estimate a priori
Assume: 2175 Hz @ 2.6 KB/evtRaw event size assumed same for all events (has varied very little with background over KLOE history)
Assume: L = 100 b1/s
1 pb1 = 104 s:25.0 GB for 9.6M physics evts31.7 GB for 12.2M bkg evts
(1215 Hz of bkg for 104 s)
56.7 GB/pb-1 total
Stream2001-2002
GB/pb1
2004
GB/pb1
KK 11.6 11.6
KSKL 19.7 12.8
3.3 3.3
radiative 6.4 6.4
Bhabha 56.0 14.0
other 0.8 0.8
Total 98 49
raw recon Include effects of streaming changes:
MC Assumes 1.7M evt/pb1 produced all (1:5) and KSKL (1:1)