wiki:developers-meeting-20160614
Last modified 3 years ago Last modified on 06/14/16 16:13:19

dCache Tier I meeting June 14, 2016

[part of a series of meetings]

Present

dCache.org(Paul), NDGF(Ulf), KIT(Xavier)

Agenda

(see box on the other side)

Site reports

KIT

Xavier reported that dCache production was fine for the past week.

He did have a number of questions.

tag deletion

First he followed up on a (long lasting) request: a script to clean up tags.

After a few iterations, Tigran provided a script that is functional. Unfortunately it is (much) too slow.

Xavier has some 21,000,000 tags to be deleted and the current mechanism (checking whether a tag inode may be deleted after each tag deletion) results in a procedure that takes too long to complete.

Commands in the pool

RT ticket 8987

Xavier also asked about the 'rh' pool command.

Paul explained that the "st set max active" "rm set max active" and the "rh set max active" commands are deprecated. These used to control the concurrency of the scripts: the number of running processes. The preferred way is to configure this through the arguments to the "hsm create" command, as the provider is responsible for any concurrency.

However, since the pool setup file may have "<st|rm|rh> set max active" commands, support cannot be dropped outright. Instead, they are converted to the equivalent flags for "hsm create" command, so that saving the pool setup will store the updated command.

The "flush set max active" command is still supported.

PowerPC

Xavier has finished his testing of PowerPC. He was asking whether there is any further testing he should do before returning the hardware.

Paul suggested he hold off for another week, if possible. We may be able to update dCache so that it works out-of-the-box for PowerPC architecture in the next release cycle. If so, Xavier would be able to test the RPMs once dCache is next released, on Tuesday.

COPY fix

Xavier thanked dCache for fixing the COPY problem. He should be able to test the updated dCache this week.

SRM database

Xavier was asking for advice on SRM databases. He was thinking of placing both Chimera and SRM on a single host and have a hot-spare machine to use if there was a problem. He was wondering, if dCache would switch to another database host without requiring a restart if the primary database host went down and the (configured) DNS name were updated to point to the backup node.

Paul believed it would, but suggested trying this out on a test instance. He also mentioned about having two database nodes; each the primary for one database and providing the backup for the other.

AP/ Paul to find out more about NDGF planned redundant PostgreSQL configuration for Xavier.

Redundency with SRM

Xavier asked about how to deploy a redundant SRM instance.

Paul described the DNS round-robin approach, and the hot-spare IP address approach (a spare machine takes over an IP address if it detects a problem). Xavier described a hybrid solution where there is a DNS round-robin, but with one machine taking over another's IP address if it detects there is a problem as a short-term solution; the problematic machine's IP address is also removed from the DNS round-robin alias.

NDGF

Ulf described how they have updated to PostgreSQL v9.5. They had allocated a one hour downtime; the update very simple, with some restarts only to tweak some properties.

Gerd also wanted to recalculate indexes before starting dCache.

dCache is now running fine.

Paul asked about NDGF's use of NFS. They have a machine that is both the NFS door and the NFS client; this machine also runs the SLAC xrootd implementation, providing ATLAS FAX access, with the data read through NFS.

Machine needs a reboot every so often, the problem seems to be triggered by poor network connections over the WAN.

There are rumours that ATLAS doesn't want the xrootd-generated statistics any more. If so, NDGF may be able to drop the SLAC xrootd implementation and use dCache native implementation. The problem for NDGF is that statistics required installing a plugin on the pools.

Support tickets for discussion

[Items are added here automagically]

DTNM

Same time, next week.