wiki:developers-meeting-20090723
Last modified 12 years ago Last modified on 07/28/09 10:47:46

dCache Tier I meeting July 14, 2009

[part of a series of meetings]

Present

dCache.org(), IN2P3(), Sara(), Triumf(), BNL(Pedro Salgado), NDGF(), PIC(Gerard), GridKa(Doris, Silke), Fermi(Jon), CERN(Andrea)

Apologies

SARA (Onno)

Technical issues

Site reports

BNL

  • trash directory and slow removal of files
    • ATLAS Computing Operations eLog 4646
    • ATLAS Computing Operations eLog 4656
    • WLCG report:
      Wednesday, July 14-16 2009
      Problem: some pnfs timeouts
      Cause: limit on the number of inodes was reached on the partition holding the pnfs trash information
      Solution: added more space (therefore gained more inode entries) from a free volume into the pnfs trash partition to gain some time.
      created a new partition with 1K blocks and 100G to have plenty of inodes so that we don't reach this limit again.
      moved pnfs configuration into a new empty trash directory in the new partition and made a shutdown of dcache to change the pnfs configuration and to be able to restart all pools without no interferences
      

Cleaning up trash files

Pedro asked about the PNFS trash directories. He sees two subdirectories within the "trash" directory: "1" and "2". The subdirectory "2" is mostly empty but occasionally has one or two files. The directory "1" contains an ever increasing number of files. How do I get rid of these files?

These directories are populated with a file when a file is deleted from the namespace. If a file has non-empty level-2 then the "2" directory is populated; similarly for level-1 and the "1" directory. All files will have non-empty level-2, but a dCache cleaner will notice files appearing in "2" directory, remove the cache copies from the pools and then delete the corresponding file from the "2" directory. This explains why "2" directory is mostly empty but occasionally has a few files.

The entries in "1" directory are those files that have been deleted from the namespace and that have non-zero level-1. These are files that have been stored on tape.

Traditionally, sites have had to clean up these files themselves: they periodically scan the "1" directory for files that are to be removed from their tape system, delete the files from tape system, then delete the files from the "1" directory.

There is now an HSM cleaner service that will monitor the "1" directory and call out to the normal HSM interface script with the "remove" command. However, this only works for OSM.

PIC

Gerard reported that PIC has been good for a while, but now SRM suddenly just become unresponsive. This also affects the SRM admin interface: it too was also unresponsive. They found that restarting dCache was enough to get SRM starting again.

Perhaps related information was that the Tomcat log file was 3.3 GiB

Gerard will send an email to support@… with the last bit of log file (~last 20 minutes or so).

FZK

Everything is OK with production systems.

Fermi

Loading a new SRM jar. Working as expected.

Today Jon will be tuning dCache parameters to optimise performance.

CERN

Any progress with making the dCache setup for Tier-1s available on a webpage.

dCache to look at making this page available.

SARA

(via email)

We have no issues.

DTNM

Proposed: 16:15 EST, Tuesday, 28th July 2009