dCache Tier I meeting March 2, 2010

Present, Owen, Paul), Triumf(Simon), PIC(Gerard), GridKa(Doris, Silke)


Site reports


Doris reported a problem with some CMS sites, which is not yet completely understood. What appears to be happening is that, due to a poor network connection, transfers to a particular site are taking a long time. A client is acquiring many TURLs for transfers to this site. These request exhaust the limited supply that dCache's SRM will give out. Since these transfers are taking a long time, this acts as a denial-of-service attack on dCache.

The issue with ATLAS files being into dCache but not appearing in the corresponding SRM-space token is still being looked at. FZK are still investigating the problem and will update the (dCache) ticket when they have more information.


Gerard reported that they have no issues at PIC right now.

They have seen behaviour similar to the gsidcap problem that IN2P3 have reported: it looks like the GSI credential is not accepted, perhaps due to the CA Certificate no longer being accepted. Gerard asked to be added to the IN2P3 dCache ticket (No. 5313).

Gerard also reported that LHCb want to test using the dCache xrootd door. They have enabled this but were unable to see it in the SRM "info" command. They are still investigating.

The issue with SRM space manager continues. Due to an unanticipated PostGreSQL upgrade, one of the databases at PIC became corrupt and they needed to restore from a backup. This worked, but resulted in some files being stored in dCache but not being accounted for in the appropriate SRM space-token. Paul reported that Dmitry is working on a procedure to recover from this problem.


Simon reported that they have recovered from their database corruption issue. The bad rows in the t_dirs table have been deleted, allowing Chimera to function correctly again. They also conducted a consistency check and found some 83 files that were under that directory.

There are some t_inodes entries that have no corresponding t_dirs entries. Only one directory seems affected by this issue. Triumf are conducting more tests and will send an email to when they have more information.

Simon reported another issue: if, in a test, one executes an SRM-rm, then the namespace entry is deleted but the file remains on the pool. This is for pools that are not connected to tape, and with LSF-mode = precious. Problem is present with both old and new files.

In addition, the space-token records for these file remains. This prevents copying a fresh copy of the file into dCache.

During some investigation over the conference, we discovered that the t_inode directory had the expected trigger that should populate the t_location_trash table when files are deleted. The chimeraDomain log file should contain information from the cleaner explaining why it was unable to delete the files.

Support tickets for discussion

