Table of Contents
dCache Tier I meeting November 4, 2009
[part of a series of meetings]
Present
dCache.org(Timur,Tigran,Patrick,Owen,Paul, Irina), IN2P3(), Sara(), Triumf(Simon), BNL(Pedro), NDGF(), PIC(), GridKa(Silke,Doris), Fermi(Jon), CERN()
Agenda
(see box on the other side)
Site reports
FZK
Doris reported that they have upgrade their second oldest dCache instance to version 1.9.5-5 with the node running PnfsManager using the pre-release version of 1.9.5-6. The update went fine.
Doris reported a black hole in the ATLAS instance: one pool was continually writing about NullPointerException. This exception was logged without any stack-trace. All pool-to-pool transfers were going to that particular pool. It isn't clear why this was happening.
The problem coincided with a fix to the pool's local file-system. This work should have been unrelated.
Restarting the pool fixed the problem.
Paul asked which info provider was being used? The latest: a pre-release of the version available with 1.9.5-6.
Fermi
Jon reported that things are fine and are running smoothly.
Timur asked about Catalan's report indicating a problem with 3rd party transfers. Jon reported that FTS Copy was broken but they switched FTS to use GridFTP 3rd-party copies. This works fine.
NDGF
Gerd reported that, last week, NDGF attempted to switch to the "new" (1.9.5-5 or later) info-provider. This was unsuccessful but the second attempt this afternoon, succeeded. The new info-provider is running in production.
Also updated to dCache v1.9.5-6. This resulted in weird errors from FTS about SRM-ping failing. Restarting the SRM fixed the problem.
Owen reported that he, too, has seen this problem.
The dCache developers will look into this.
Triumf
Simon reported that things are running smoothly.
They are currently running 1.9.5-4 and Chimera in their test system system. When these tests complete, they will make a decision about whether to migrate from PNFS to Chimera.
BNL
- java.lang.ArrayIndexOutOfBoundsException on SRM 1.9.4-3 : http://www.dcache.org/rt/Ticket/Display.html?id=5202 RT#5202
- pin manager breaks when trying to remove pins which it cannot find in its database (problem seen when using the migration module) : http://www.dcache.org/rt/Ticket/Display.html?id=5196 RT#5196
- -allow-relay option is missing from the gridftpdoorSetup therefore on each upgrade one needs to remember and edit the gridftpdoor.batch file by hand : http://www.dcache.org/rt/Ticket/Display.html?id=5173 RT#5173
- need to use vorole-mapping since gPlazma with any other option is making SRM fail internally with role mismatch (seems it has been fixed. need to know on which release it will be available)
Incomplete listings
People have been reporting problems with listings being incomplete. This problem seems to be limited to the 1.9.5 series of dCache releases: earlier versions do not seem affected.
The issue is that, in some cases, the directory listing is incomplete. This affects all listings generated by dCache (GridFTP, SRM-ls, dcap, etc). Clients that list files using a mounted file-system are unaffected.
dCache team are investigating this.
Database connections
A site reported running out of database connections. This site had all dCache components using the same PostGreSQL database. Such a deployment requires PostGreSQL configured so it allows a high number of database connections.
DTNM
Same time, next week.
