wiki:developers-meeting-20090707
Last modified 12 years ago Last modified on 07/28/09 10:48:32

dCache Tier I meeting July 7, 2009

[part of a series of meetings]

Present

dCache.org(Tigran,Paul,Gerd), IN2P3(), Sara(), Triumf(Simon), BNL(Pedro), NDGF(), PIC(), GridKa(Silke, Doris), Fermi(Jon), CERN(Andrea)

Apologies

Technical issues

Site reports

BNL

  • 2nd July BNL had a similar problem as in STEP'09: transfers got queued in MCTAPE write pools and timed out causing lots of problems. they have changed their setup to have 59 write pools (x2: MCTAPE and DATATAPE) on 59 different machines (200GB and 350GB respectively).
  • they have setup some pools with resilient dcache for conditions data (will be tuning this up in the upcoming weeks)
  • they will be putting 59 read pools (x2) for the MCTAPE and DATATAPE on the same 59 machines in order to concentrate all tape migration and pool2pool copies within those machines.
  • trying (as much as possible) to move disk only areas from machines which interact with tape.

CERN

Nothing special.

Andrea reported a wiki page that summaries STEP09. In addition to the site-specific comments, there is an overall summary of dCache. Comments are welcomes.

Fermi

Things are fine. Jon reported that he has submitted some bug reports, which are in progress; other than that things are fine.

NDGF

Nothing to report.

Triumf

No issues to report this week.

FZK

Most of it is fine. Started stuttering transfer failures. Pools have files in error state; these need to clean this manually, which is annoying. These files are the result of failed SRM + GridFTP transfers.

Gerd reported that he's seen this behaviour when the user (e.g., FTS) deletes a file whilst a transfer is underway.

Currently these files must be cleaned up outside of dCache. FZK has automated this process, producing a script for removing these files.

Gerd noted that the problem should be solved by migrating to Chimera or using the (not yet released) new version of PNFS.

Doris also reported seeing lots of GridFTP transfers queued.

Pedro: we were seeing the same kinds of problems: the MCTAPE pools being hit hard. Switching dCache configuration to allow writing into many pools and selecting the pools at random spread the load, so alleviating the problem. (BNL are confident that their pools have sufficiently fast HSM interconnect that files will not remain on pools for long.) This seemed to fix the problem.

HSM integration

Pedro asked about how to provide a less heavy-weight integration between dCache and their HSM system.

BNL currently use Perl scripts that interact with a batch system for scheduling requests. These scripts require PNFS to be mounted in order to discover additional information.

Pedro asked if they could write some Java code that interacts with PnfsManager to cut out the operations on a mounted PNFS. The aim is to avoid forking processes to discover additional information

Gerd asked if the URI support has been investigated? This may be sufficient to satisfy your requirements.

There was some discussion about the URI support and the arguments the dCache supplies to the HSM script. It is possible that some of the information that the Perl scripts need is supplied already.

Pedro is going to investigate further and get in touch with dCache team if he has further questions.

DTNM

Same time next week: Tuesday 2009-07-14 16:15 CEST.