dCache Tier I meeting February 1, 2011

[part of a series of meetings]

Present, Antje, Tanja, Tigran, Paul), Triumf(Simon), PIC(Gerard), GridKa(Doris)


Site reports


Last week was good, nothing to report.


So far, so good.

No serious problems.

Question about old SRM (on old instance, just upgraded to new hardware).

JDBC task failed, JDBC queue

Max number of tasks in queue

Number of outstanding database requests from SRM to the database. If it's full then we don't store state in the database.

If you increase this number. No.

Running 1.9.5? Yes.

Before the upgrade, the dcache instance was also running 1.9.5, but an earlier version.

Some status are not stored in database. Restarting SRM may result in

Running requests are always taken from memory.

DB is only used for history plots or

Too many SRM threads or too many Tomcat processes allowed.

Number of JDBC

Doris will investigate problem and open a ticket if the problem isn't understood.


We're still recovering data from misplaced DDN operations (just a guess). One of the pools is almost computer


Pool starts it

Serving 25% of the files. The others have to be copied to another pool. This will take us about 9 days.

200 TiB loose, we guess about 5% will be corrupt in the end. ATLAS says a lot of this is junk.

1,000 files.

Apart from this corruption issue, everything else is running fine.

DDN are requesting various log files.

From the controllers there's not much information during the corruption period.

Similar at DESY

DESY has observed effects that are consistent with DDN confusing LUNs between controllers.

Patrick to get Gerard and Martin G. in touch.


The engineer working on the problem says that there no way for ZFS for detecting this problem: the data

Do you run RAID-0 on it? Striped pools (RAID-z). Yes.

This explains why ZFS is not able to recover.

IOError or checksumMismatch.

Support tickets for discussion

Same time, next week.