Last modified 4 years ago Last modified on 02/16/16 15:34:04

dCache Tier I meeting February 16, 2016

[part of a series of meetings]

Present, IN2P3(), Sara(), Triumf(), BNL(), NDGF(Ulf), PIC(Marc), KIT(Samuel), Fermi(), CERN()


(see box on the other side)

Site reports


Ulf reported that they have no problems with dCache.

They have had some hardware-related problems. There was one pool that had continuous problems that they believe are due to a broken controller. They have drained the pool and it's now emptied. They lost some 33 files as a result.

Partly the problem is with XFS, too. With 40 TiB pools and ALICE usage (lot of small files) one cannot run fsck as the XFS crashes. This is a known limitation in XFS.

NDGF are planning a full upgrade cycle: head nodes, Linux kernels and PostgreSQL version. Ulf believes that all the pools will also upgraded to the same version, at the same time. They are interested in whether there will be a new dCache version in time -- their deadline is 2016-02-17 12:00 CET.

Beginners tutorial

Ulf mentioned there would be a Finnish newbie joining the workshop in Barcelona. He believes there will be an NDGF newbie, too. Is dCache planning to run an introduction course there?

Paul will investigate.


Marc reported that PIC is running smoothly now.

Tape problems

They had two problems, which are both resolved now.

First, some ATLAS new pools, running CENT-OS 7 had slow network cards when placed in production. When Marc tested them before deploying the network seemed fine. The result was that these pools became a bottleneck for getting data off of tape, with some files taking more than 3 days to fetch.

Installing an updated kernel driver solved this problem: the default driver (kmod ixgbe from Intel) that comes with the kernel (4.0.1) was upgraded to (4.0.3). Despite nothing in the release notes describing that this fixes a performance-related issue, the upgrade fixed their problems.

The other problem was that enstore (real_encp) does not handle dCache timing out a request to stage a file. PIC solved this problem by increasing the timeout to 10 days.

The situation that led to these problems was usual: ATLAS wanted to reprocess all data from 2012 to 2015. Therefore, once all the data was fetched, they have been unable to verify that their fix for real_encp really works.

WebDAV and pinning

Marc asked if it is possible to pin and unpin files using WebDAV in non-blocking fashion?

The problem is that members of a VO are opening files sequentially through WebDAV, which is causing enstore to load and unload tapes without any optimisation.

Paul: currently, no. Is SRM an option?

It's unclear: some of the user community members have X.509 certificates, others do not. Access by users is mainly via username+password.

Enabled dCache for LDAP users (username + password), so need to support username+password based authn & pinning workshop.

One of these guys from this VO will be visiting dCache workshop, to talk with people. The VO is "PAUS" (like MICE).


Marc noted that the documentation is somewhat out-of-date. For example, it mentions the database as "srmdcache", which has changed.

Marc offered to help; Paul pointed him at our github repo.

Splitting the SRM

Marc plans to split the SRM into VO-specific SRM endpoints. He wasn't completely clear on the procedure for handling the database.

Paul recommended he open a support ticket.

Replicas with 000 perms

Marc noticed that some pools contained a large number of files, owned by root, have 000 permission. It is unclear if these are created by dCache, or by enstore.

They are owned by root, because they run pools as root.

Paul recommended opening a support ticket.


No issues related to dCache

Feature request

Samuel reminded us of RT ticket 8529:

This is a feature request to enhance the rep ls command so it can filter based on storage info.

This is not urgent but it would be useful.

Support tickets for discussion

[Items are added here automagically]


2016-02-23 14:30 CET