wiki:developers-meeting-20090714
Last modified 12 years ago Last modified on 07/28/09 10:46:39

dCache Tier I meeting July 14, 2009

[part of a series of meetings]

Present

dCache.org(Patrick, Paul, Gerd), IN2P3(), Sara(), Triumf(Simon), BNL(Pedro, Salgado), NDGF(Gerd), PIC(Gerard), GridKa(Silke, Doris), Fermi(Jon), CERN(Andrea)

Apologies

Technical issues

Site reports

NDGF

Various problems, almost all of them are not directly dCache issues.

Pool related problems,

At ?? we lost 25 TiB of storage due to a RAID controller failing; because NDGF have redundancy, no data was lost.

Operator intervention was needed to clean up half the pools in Slovenia. Again, no data was lost as a result of this issue.

Since NDGF have deployed the new xrootd door, Gerd's noticed the "xrootd pools" are consuming too much CPU. This is currently unclear why this is, but current belief is it may be due to the pools running out of memory triggering excessive GC.

Another operator intervention was needed when a filesystem ran out of space.

Noticed xrootd pools mark themselves offline when there's a file-open failing. The problem is localised to an individual site, so it is unclear if this is a problem with dCache or with the SAN. When checking the file, it is always present subsequently.

Owen asked which version of dCache NDGF are running?

The head-nodes are running v1.9.3, as are the pools in Umeå

The pools at the other sites are running some release of the v1.9.2 branch.

Triumf

OK, no issues to report.

FZK

Everything runs fine currently.

CERN

Nothing to report.

PIC

Everything is OK.

Gerard had a question about releasing a new version of PNFS with trash-table support. Is there any news about this?

Patrick replied that there's been no change: we would like to do this, but making the dCache v1.9.4 release is keeping everyone very busy.

BNL

  • dcap client build for Solaris (Ofer is already working with Owen)

Fermi

Nothing to report things are working well.

Configuration engines

Gerard ask whether people had any experience with puppet. At PIC they are looking at puppet, but are currently only at the deploying stage.

BNL

Started using CFEngine for hardware configuration: moving away from farm-nodes, moving everything to cfengine. The main motivation for cfengine over alternatives was that is what other groups within the site are using for configuration. Subversion is used to provide version control of the configuration.

Planning on providing some mechanism to allow rapid changes to dCache configuration via a GUI, but this won't be ready before the end of the year.

FZK

FZK are using cfengine for machines, but currently not for dCacheSetup. They are in the process of using cfengine to distribute dCacheSetup files so they can ensure the different machines have the same contents. This is not yet in production.

Fermi

Fermilab uses ROCKS.

DTNM

Due to a clash at DESY, the next meeting will not take place on the usual day. Instead, the next meeting will be Thursday 23rd of July.