Last modified 3 months ago Last modified on 08/01/17 15:46:51

dCache Tier I meeting August 1st, 2017

[part of a series of meetings]

Present, IN2P3(), Sara(), Triumf(), BNL(), NDGF(Ulf), PIC(Marc), KIT(Xavier), Fermi(), CERN()


(see box on the other side)

Site reports


Things are running just run for dCache.

space-manager namespace consistencies

Notification: several months ago Gerd sent a guide on how to space-manager table. Looking for inconsistencies.

Samuel did this on a test inconsistencies -- found some!

Easy things can be fixed

Files have the wrong size in space-manager table.

Incoming ticket on how to update database to match.

File sizes of 1 byte.

Normal file size in space-manager, but zero-length in Chimera. This could be a temporary issue -- still checking.

50,000 files in space-manager that are not in the namespace.

4,000 files were flushed to tape, but space-manager did not know about them.

(compared to more than 3,000,000 files; these numbers are fairly negligible)


Installing updates. Nothing bigger happening

email from last week, GSI-dcap.

Jolic Pinjac; asking about dCache v3.0.10 onwards as this version has a security vulnerability.


Some issues this past week -- two main problems.

Overloaded CMS pools

First, overload of the pool -- caused by CMS, doing many activities

Pools get overloaded -- too much IO.

Failing some tests --

Reduced number of movers per pool.

Tests fail due to being queued.

Some pools have ~200 TiB.

Once every three months, try to rebalance the pools.

Sometimes pool become "hot". Have limited the number of 1,500 xrootd movers.

Have p2p replication enabled -- "soft mode".

Purchase new hardware in 2 months, bringing PIC up to pledges.

CMS is reading files directly, whereas ATLAS is copying the complete file to the worker-node.

KIT is running GPFS (840 TiB pool, 7 pools on 7 servers; peak 220 GiB/s); PIC is running NAS servers

Problem is seem with old hardware, which is being decommissioned currently.

Suggest protocol-specific-links to avoid random-IO hitting the old hardware pools.

Chimera problems

Problems with Chimera connections.

Several NFS connections through different NFS doors; reached maximum number of connections that PostgreSQL -- was set to 10,000.

90-nproc.conf -- soft limit of 1,000. Increase this to 10,000..

Have 7 nfs doors -- 6 doors for NFS 4.1; 1 NFS door for enstore on pools.

Default is 30 per door; have increased it to 300.

Not clear why this was increased.

There was a VO that was running ~1,000 requests concurrently through NFS, and there were problems.

Support tickets for discussion

[Items are added here automagically]


Same time, next week.