Table of Contents
dCache Tier I meeting June 8, 2010
[part of a series of meetings]
Present
dCache.org(), IN2P3(), Sara(), Triumf(Simon), BNL(), NDGF(), PIC(Gerard), GridKa(Doris), Fermi(), CERN()
Agenda
(see box on the other side)
Site reports
PIC
We have no new issues.
Last issue is tape production. Running 1.9.5-20 RC-1 already acknolwdge bug with dcap mover dies when there's too much activity.
Issue arose 1 week after
Restarting door solved the issue.
Tigran to send a link to a new RPM that should verify that the problem is Bug in re-reading file, but this should only happen when file is updated. PIC reported that the file may have
What is the file's last modification date? stage.conf .. last modified 2 June 11:57.
FZK
Not so bad.
Last week promised to open a bug about the SRM
Downtime changed ATLAS instance from kpwd auth method to using gridmap method. We forgot one group-and-role from the linkgroup-authorisation file. Do we need to restart the SRM after modifying this?
Adding new VO must be done in VO-role map is automatic reloaded.
Look into making this for SRM too.
Triumf
80--90% .. no
empty directories: in Chimera database.
Empty but belong to admin directory.
BNL
Everything is stable. We have a few open tickets.
GridFTP doors.. has it already been included in the latest releases.
Tigran: ConcurrentModificationException. It looks like we really have a problem there. We would like to fix this in 1.9.5 and later.
Discussion
Pool rebalancing
balancing more expensive / cheaper . Using migration tools concurrently.
We have new pools.
Thumpers that have a tendency to hang.
Replicating cached copies to the thumpers.
If request DDN to migrate data to these thumpers then these will fail.
FZK
Using hopping manager to replicating files into a tape-read pool, so users can access
Hopping manager is service triggered e.g. when files are written.
Can you define a poolgroup as destination? Yes.
What happens if the pool is full? The hop is lost: the transfer fails and isn't retried.
Triumf
...
Go for migration module.
BNL
Set of pools into which data is written.
Move data into other pools
10TB pools, 30 pools.
Issue 1 migration job a week: slowly free up space on the write pools.
Different target-pool-selection criteria
How
We are using DDN here at DESY. Quite happy with them: better than Thor machine, but sometimes see very high latency with DDN.
BNL:
1.6 GiB/s (reads) per server, we have 4 servers. IO waits due to checksumming (better with md than with lvm). Linux with XFS. writes go to Thors (30) can do 800MB/s to the local disks+nas.
What kind of filesystems,
PIC, 125TB partitions, ZFS filesystem. Thumpers hang sometimes with high load. Good for reads, but ...
BNL: using thumpers only for staging.
Support tickets for discussion
[Items are added here automagically]
DTNM
Same time, next week.