wiki:developers-meeting-20100824
Last modified 11 years ago Last modified on 08/24/10 16:48:21

dCache Tier I meeting August 24, 2010

[part of a series of meetings]

Present

dCache.org(Patric,Owen,Antje,Paul), IN2P3(), Sara(), Triumf(Simon), BNL(), NDGF(), PIC(Gerard), GridKa(Doris,Silke), Fermi(), CERN()

Agenda

(see box on the other side)

Site reports

PIC

Everything is OK.

FZK

Everything fine.

Issue Gerd helped; we discovered tape write pools didn't write to talk. Newer version of dCache checked

Just (a few minutes ago) atlas dcap door has 4,700 dcap movers active. On the pools we have only 2,000 dcap movers active.

How long has this

Had some transfers failing. Have increased the number of Max log-in

Is the number of stage requests

Recommend pick a door and check with command-line interface to check what the door is doing. Type "info" and copy this information to a ticket (support@…).

Log into a door that has no mover. How do I know this? Check the web interface for transfers where "no mover found".

We sometimes see a door closed due to end-of-input.

Please check first whether this entry is in the stage request? No, we currently don't have any stage requests for ATLAS.

Triumf

Mostly OK.

Two things to report.

Problem 1

In production. The second time; for hot-file. Billing system. I noticed five more; roughly. No successful access.

Didn't get any useful information .. whether correlated with hot file replication or not; however, 360 file-transfers failed. That file had been accessed 20k times in total. Problem seems to be correlated to pool-to-pool transfers.

2 hours period, only the hot files failed. Lots of requests of that file. dCache made five more copies of that file.

Requests failed at the door level... something like error: PoolManager didn't give the door requests to the proper pool. Each door has a 1,500 limit, so we don't believe this is a door limit.

Failed transfers happened on different doors.

Whilst files transfers were failing, other transfers (at the same time) succeeded.

Have seen this with a Hot-file replication threshold to 90%, may reduce this threshold

Door didn't reject the requests. It accepts the request then fails. No contact to the pool.

Billing say the problem was Java IO error.

Please grep billing for the particular PNFS-ID.

another one

Solaris system .. testing Java 64-bit. The pool service itself, this doubles the memory usage. This is Solaris 10 and Java 1.6 32- and 64-bit versions.

If I use 1 GB memory size then I can only 600,000 files.

dcap hanging

Bug report about jobs getting stuck in a wait-loop with ROOT and dcap for ATLAS. Blocking on files that are meanwhile accessible via dccp. There is a message "protocol violation" in dcap door.

HSM URLs

Support tickets for discussion

[Items are added here automagically]

DTNM

Same time, next week.