wiki:developers-meeting-20160315
Last modified 3 years ago Last modified on 03/15/16 15:51:06

dCache Tier I meeting March 15, 2016

[part of a series of meetings]

Present

dCache.org(Paul), PIC(Marc), KIT(Xavier)

Agenda

(see box on the other side)

Site reports

PIC

Marc reported that things are OK at PIC.

Two weeks ago, PIC went into downtime and split their SRM endpoints. Now, their production instance has five independent SRM endpoints.

This was mostly smooth, but they did have issues with their info-provider. They are currently running five info-providers. Marc and Paul will investigate how to reduce this to a single info-provider.

Next week PIC has a scheduled downtime. The plan is to move space-manager & ?? to a separate server.

AP/ Paul and Marc to iterate deploying improved info-provider to remove any limitation preventing a single info-provider from publishing dCache correctly.

VO specific doors

Marc asked about making doors VO-specific.

Paul described how to adjust the srm.protocol.tags and *.loginbroker.tags to achieve this for doors that SRM selects.

However, he noted that this would not adjust the info-provider. The GLUE allows publishing which VOs are allowed to use a door; the info-provider currently assumes that all VOs may use any door.

dot commands

Marc was asking about dCache dot commands: is there any documentation describing them? Their use-case is PIC users who are not using X.509 and SRM, but wish to interact with dCache; currently they pin files using a dot command but it would be useful to know what else can be scripted. Paul said there's no documentation at the moment, but it is something we should address.

AP/ Paul to raise documentation.

KIT

Xavier reported that there are no issue with dCache.

CMS deleting files

CMS claims there are problems where files are being lost. So far, all detailed investigations have shown that CMS have deleted the missing files, shortly after uploading the data.

Xavier described this as an "old" issue; one that became critical over the last month. Currently, the problem is occurring every day.

The problem is NOT from FTS, but from the CMS framework. This is happening when the job uploads the results of the transfer.

Paul described a similar problem that LHCb had with PIC, where files were deleted. The problem there was due to LHCb upgrading their version of srm_ifce library, which changed a built-in default value for the request lifetime to 5 minutes. Details are described in GGUS ticket 112413:

https://ggus.eu/index.php?mode=ticket_info&ticket_id=112413

Xavier believes he has provided CMS with evidence to show that the deletions are coming from the CMS client; the ball is in their court to identify what is going wrong.

Removing tags

Xavier reminded us that Tigran promised to look at his SQL function for tidying up tags: RT ticket #8890.

The problem is actually due to chimera cli's buggy implementation of rmtag command, which can leave remnants in the table. It also doesn't support recursive tag deletion.

AP/ Paul to investigate what happened with the tag inheritance semantic change.

Support tickets for discussion

[Items are added here automagically]

DTNM

The next meeting will be 2016-03-22 14:30 CET