Table of Contents
dCache Tier I meeting November 8, 2011
[part of a series of meetings]
Present
dCache.org(Patrick, Tigran, Paul, Antje), IN2P3(Nicolas), PIC(Gerard), GridKa(Doris)
Agenda
(see box on the other side)
Site reports
PIC
Gerard reported two issues: one with PICs PNFS-to-Chimera migration; the other is triggered from LHCb.
PNFS-to-Chimera Migration
Gerard reported that the instructions on how to migrate from PNFS to Chimera seems out-of-date. They were unable to follow the instructions in the wiki, in part because some of the files mentioned did not appear where they were documented.
Paul will update the page, moving the instructions into the book.
Gerard also asked about the support for multiple databases.
Paul replied that pnfsDump will handle multiple databases without any problem. The simplest solution is to let pnfsDump run over the complete namespace (over the different PNFS databases).
Gerard also asked if multiple databases are supported in Chimera: no.
LHCb
Gerard reported a problem with their LHCb instance. The problem has been mitigated, so is not impacting on their production service.
|P2p in a partition
Pools are full; sending bunches of requests;
many transfers get stuck
trying to read from tape
results
60% failures
disabled p2p for them.
Problem for the last 2
Also had to disable p2p for LHCb for GridKa ..
February they're going to read all data currently on tape; trying this out now.
Pilot jobs then bring-online. Guess: bring-on-line several times
LHCb have been complaining that user jobs are disturbing production; they haven't been running so many user jobs before.
Could you check if they issued large number of bringOnline requests.
Did LHCb issue a large number of bringOnline requests concurrently: could
P2p was triggered by percentile
3,000 srmBringOnline per hour.
Gerard checked bringOnline checked had 20 files, but things are working OK now.
Number of movers: p2p-server: 50, p2p-client: 0, other transfers: 300.
Currently seeing 4--6 files per bringOnline request
We will send an email with details on how to enable the debug-level logging.
Problem is with dCache 1.9.12-10.
Doris is running 1.9.5-26
GridKa
1.
LHCb request: asking change of behaviour: file may be not available. dCache keeps trying to get the file. Want the SRM to fail quickly if a file cannot be read.
ATLAS model is:
o bringOnline 24 hours o job submitted
GGUS ticket.
2.
Ticket Xavier dcache.org .. about frequently happening problem. Because of one pool misbehaving can have a knock-on effect in the pool-manager. Ticket #6773.
3.
Announce next week Tuesady ATLAS upgrade to 1.9.12.
IN2P3
Everything is OK.
Support tickets for discussion
[Items are added here automagically]
DTNM
Proposed: same time, next week.
