Table of Contents
dCache Tier I meeting July 27, 2010
[part of a series of meetings]
Present
dCache.org(Patrick, Paul, Antje, Gerd), IN2P3(), Sara(), Triumf(Simon), BNL(Pedro), NDGF(Gerd), PIC(Gerard), GridKa(), Fermi(), CERN(Andrea)
Agenda
(see box on the other side)
Site reports
NDGF
Quite fine. We have upgraded our head-nodes to 1.9.9. This went successfully.
Gerard asked about whether this was 1.9.9-1? No, NDGF is runnning 1.9.9 with some additional patches.
There are some known issues with 1.9.9-1. This version is not recommended for production use, but is OK for testing.
dCache.org are planning on starting the releasing process for 1.9.9-2 "soon" (likely Wednesday). If all goes well then the new release should be available at the end of the week.
CERN
Andrea had nothing to report.
PIC
Everything alright at the moment.
Issue a set of linkgroup. Didn't fall-over to a different link in the same linkgroup.
Triumf
OK last week.
One small issue, ticket about. A hot file ... accessed very often. P2P happened a couple of times. This P2P failed, which caused problem. Will open a ticket if the problem comes back.
BNL
We had a few problems with PostGreSQL databases. Free-space map was too small. PostGreSQL didn't have the free-space map. Vacuuming wasn't working correctly. PNFS going down very often. Change parameter: dump-restore all databases. Not a dCache problem.
free-space map now big enough.
Please close ticket 5591.
We have now updated the dcap client and this has solved ~99% of the problems.
Do have a problem with the gsidcap doors. When we have PNFS timeouts, the transfers from the doors to the pools would take
After PNFS issues, gsidcap doors doesn't seem to be recovering.
Many concurrent transfers. 6 ports running, so we don't believe the door is being overloaded.
If we have too many, the transfers would just hang there for a few minutes before
Is the pool connected to the client? or queued on the pool?
Requests aren't being queued on the pools because
After we see this problem we start to see PNFS-timeouts.
Do you know if there's a bug in 1.9.4 that has been fixed in later releases?
We managed to overcome these problems because there is a wrapper around
Do you know if the number of doors is growing indefinitely or number of open file-descriptors?
Please provide a log of the door.
Not clear why requests would hang;
Tomorrow, start gathering logs for these hanging
Our problem with performance is not with PNFS, but
Update of metadata takes 400 ms
Discovered in the PostGreSQL log files .. reporting all queries that take more than 200ms.
After dump-restore, the database shrunk from 300 GiB to 200 GiB.
Currently using PostGreSQL 8.3.7 64-bit.
Do you get a long queue (>200 or 300) in PnfsManager? Yes, ffects everythig.
Support tickets for discussion
[Items are added here automagically]
RT 5753: dCache 1.9.5-21@PIC: PoolManager cost issue (ATLAS impacted)
RT 5754: Transfer a fixed amount of data using migration copy
The question was about how to tell a pool to migrate only a certain amount of data.
This isn't supported currently.
With 1.9.10, the migration module will allow one to specify conditions. Conditions can apply to target pool selection (e.g., select only those pools that are more than 50% empty) and can have termination conditions (e.g., finish transferring if target pool has less than 10% free space).
DTNM
Same time, next week.
