dCache Tier I meeting January 12, 2010

Present, Paul), IN2P3(), Sara(Onno), Triumf(Simon), BNL(), NDGF(), PIC(), GridKa(Doris, Silke), Fermi(), CERN()


Site reports

Support tickets for discussion

Yesterday upgrade 1.9.6-2 with a number of changes back-ported from trunk.

As of today, NDGF is running the SRM without Tomcat; instead, it is running within Jetty. So far, there have been no issues.

  1. What frequency did NDGF see the problem with CAs being no longer supported in the SRM? It happened approximately once a week.


Onno reported that Sara are migrating to Chimera now. Presently, the md5sum check in running. So far it's going well.

The plan is to complete the migration by Thursday. Hope we can finish in time.

For the StorageInfo?-check, when we tested it, it took a few hours.

At Sara, the doors and pools are already 1.9.5-11. We are going to upgrade the head-nodes to the same version; they are currently running 1.9.5-10.

Onno was unsure whether Sara were still on the LHCb black-list.


Simon reported that, last week, they saw a large number of SRM errors. These errors were not actually from our side: the DDM people were staging many files from tape. They were doing this by doing an SRM-ls on each file every 1 second. They have now changed this to checking every 600s to check if the file is on disk. The problem was that Tomcat had reached the maximum number of open files. After increasing this, so far looks good.

Simon also reported that Triumf anticipate upgrading to 1.9.5-11 tomorrow

The rate of the SRM-ls queries was 500,000 srmls in half-hour (~ 278 Hz) which blocked SRM for other users.

Simon also reported that Triumf is planning to migrate to Chimera at the end of January.


Doris reported that currently everything seems to be OK. Over Christmas, FZK had some problems with SRM but, at the moment, the details are vague.

Doris also reported that they are currently preparing for the Chimera migration. They have found that they have some 40,000 files with some kind of problem (storageinfo-related). As the bulk of these files are test-related, this doesn't necessarily indicate an overall problem.


Same time, next week.