wiki:developers-meeting-20100727
Last modified 11 years ago Last modified on 07/27/10 16:50:03

dCache Tier I meeting July 27, 2010

[part of a series of meetings]

Present

dCache.org(Patrick, Paul, Antje, Gerd), IN2P3(), Sara(), Triumf(Simon), BNL(Pedro), NDGF(Gerd), PIC(Gerard), GridKa(), Fermi(), CERN(Andrea)

Agenda

(see box on the other side)

Site reports

NDGF

Quite fine. We have upgraded our head-nodes to 1.9.9. This went successfully.

Gerard asked about whether this was 1.9.9-1? No, NDGF is runnning 1.9.9 with some additional patches.

There are some known issues with 1.9.9-1. This version is not recommended for production use, but is OK for testing.

dCache.org are planning on starting the releasing process for 1.9.9-2 "soon" (likely Wednesday). If all goes well then the new release should be available at the end of the week.

CERN

Andrea had nothing to report.

PIC

Everything alright at the moment.

Issue a set of linkgroup. Didn't fall-over to a different link in the same linkgroup.

Triumf

OK last week.

One small issue, ticket about. A hot file ... accessed very often. P2P happened a couple of times. This P2P failed, which caused problem. Will open a ticket if the problem comes back.

BNL

We had a few problems with PostGreSQL databases. Free-space map was too small. PostGreSQL didn't have the free-space map. Vacuuming wasn't working correctly. PNFS going down very often. Change parameter: dump-restore all databases. Not a dCache problem.

free-space map now big enough.

Please close ticket 5591.

We have now updated the dcap client and this has solved ~99% of the problems.

Do have a problem with the gsidcap doors. When we have PNFS timeouts, the transfers from the doors to the pools would take

After PNFS issues, gsidcap doors doesn't seem to be recovering.

Many concurrent transfers. 6 ports running, so we don't believe the door is being overloaded.

If we have too many, the transfers would just hang there for a few minutes before

Is the pool connected to the client? or queued on the pool?

Requests aren't being queued on the pools because

After we see this problem we start to see PNFS-timeouts.

Do you know if there's a bug in 1.9.4 that has been fixed in later releases?

We managed to overcome these problems because there is a wrapper around

Do you know if the number of doors is growing indefinitely or number of open file-descriptors?

Please provide a log of the door.

Not clear why requests would hang;

Tomorrow, start gathering logs for these hanging

Our problem with performance is not with PNFS, but

Update of metadata takes 400 ms

Discovered in the PostGreSQL log files .. reporting all queries that take more than 200ms.

After dump-restore, the database shrunk from 300 GiB to 200 GiB.

Currently using PostGreSQL 8.3.7 64-bit.

Do you get a long queue (>200 or 300) in PnfsManager? Yes, ffects everythig.

Support tickets for discussion

[Items are added here automagically]

RT 5753: dCache 1.9.5-21@PIC: PoolManager cost issue (ATLAS impacted)

RT 5754: Transfer a fixed amount of data using migration copy

The question was about how to tell a pool to migrate only a certain amount of data.

This isn't supported currently.

With 1.9.10, the migration module will allow one to specify conditions. Conditions can apply to target pool selection (e.g., select only those pools that are more than 50% empty) and can have termination conditions (e.g., finish transferring if target pool has less than 10% free space).

DTNM

Same time, next week.