wiki:developers-meeting-20111108

dCache Tier I meeting November 8, 2011

[part of a series of meetings]

Present

dCache.org(Patrick, Tigran, Paul, Antje), IN2P3(Nicolas), PIC(Gerard), GridKa(Doris)

Agenda

(see box on the other side)

Site reports

PIC

Gerard reported two issues: one with PICs PNFS-to-Chimera migration; the other is triggered from LHCb.

PNFS-to-Chimera Migration

Gerard reported that the instructions on how to migrate from PNFS to Chimera seems out-of-date. They were unable to follow the instructions in the wiki, in part because some of the files mentioned did not appear where they were documented.

Paul will update the page, moving the instructions into the book.

Gerard also asked about the support for multiple databases.

Paul replied that pnfsDump will handle multiple databases without any problem. The simplest solution is to let pnfsDump run over the complete namespace (over the different PNFS databases).

Gerard also asked if multiple databases are supported in Chimera: no.

LHCb

Gerard reported a problem with their LHCb instance. The problem has been mitigated, so is not impacting on their production service.

|P2p in a partition

Pools are full; sending bunches of requests;

many transfers get stuck

trying to read from tape

results

60% failures

disabled p2p for them.

Problem for the last 2

Also had to disable p2p for LHCb for GridKa ..

February they're going to read all data currently on tape; trying this out now.

Pilot jobs then bring-online. Guess: bring-on-line several times

LHCb have been complaining that user jobs are disturbing production; they haven't been running so many user jobs before.

Could you check if they issued large number of bringOnline requests.

Did LHCb issue a large number of bringOnline requests concurrently: could

P2p was triggered by percentile

3,000 srmBringOnline per hour.

Gerard checked bringOnline checked had 20 files, but things are working OK now.

Number of movers: p2p-server: 50, p2p-client: 0, other transfers: 300.

Currently seeing 4--6 files per bringOnline request

We will send an email with details on how to enable the debug-level logging.

Problem is with dCache 1.9.12-10.

Doris is running 1.9.5-26

GridKa

1.

LHCb request: asking change of behaviour: file may be not available. dCache keeps trying to get the file. Want the SRM to fail quickly if a file cannot be read.

ATLAS model is:

o bringOnline 24 hours o job submitted

GGUS ticket.

2.

Ticket Xavier dcache.org .. about frequently happening problem. Because of one pool misbehaving can have a knock-on effect in the pool-manager. Ticket #6773.

3.

Announce next week Tuesady ATLAS upgrade to 1.9.12.

IN2P3

Everything is OK.

Support tickets for discussion

[Items are added here automagically]

DTNM

Proposed: same time, next week.