Table of Contents
dCache Tier I meeting February 1, 2011
[part of a series of meetings]
Present
dCache.org(Patrick, Antje, Tanja, Tigran, Paul), Triumf(Simon), PIC(Gerard), GridKa(Doris)
Agenda
(see box on the other side)
Site reports
Triumf
Last week was good, nothing to report.
GridKa
So far, so good.
No serious problems.
Question about old SRM (on old instance, just upgraded to new hardware).
JDBC task failed, JDBC queue
Max number of tasks in queue
Number of outstanding database requests from SRM to the database. If it's full then we don't store state in the database.
If you increase this number. No.
Running 1.9.5? Yes.
Before the upgrade, the dcache instance was also running 1.9.5, but an earlier version.
Some status are not stored in database. Restarting SRM may result in
Running requests are always taken from memory.
DB is only used for history plots or
Too many SRM threads or too many Tomcat processes allowed.
Number of JDBC
Doris will investigate problem and open a ticket if the problem isn't understood.
PIC
We're still recovering data from misplaced DDN operations (just a guess). One of the pools is almost computer
The
Pool starts it
Serving 25% of the files. The others have to be copied to another pool. This will take us about 9 days.
200 TiB loose, we guess about 5% will be corrupt in the end. ATLAS says a lot of this is junk.
1,000 files.
Apart from this corruption issue, everything else is running fine.
DDN are requesting various log files.
From the controllers there's not much information during the corruption period.
Similar at DESY
DESY has observed effects that are consistent with DDN confusing LUNs between controllers.
Patrick to get Gerard and Martin G. in touch.
ZFS
The engineer working on the problem says that there no way for ZFS for detecting this problem: the data
Do you run RAID-0 on it? Striped pools (RAID-z). Yes.
This explains why ZFS is not able to recover.
IOError or checksumMismatch.
Support tickets for discussion
[Items are added here automagically]
DTNM
Same time, next week.