wiki:developers-meeting-20140429
Last modified 5 years ago Last modified on 04/29/14 14:41:10

dCache Tier I meeting April 29, 2014

[part of a series of meetings]

Present

dCache.org(Paul), IN2P3(Yvan), NDGF(Ulf), PIC(Marc), KIT(Xavier)

Agenda

(see box on the other side)

Site reports

NDGF

Ulf reported that dCache is working fine for them. NDGF were having some hardware problems, but dCache was working fine. They have been moving data between pools to do updates and this has been working fine.

NDGF currently has no issues.

Gerd and ATLAS were hunting an exotic SRM listing problem where directories with more than 100 entries sometimes returned a partial listing.

Paul mentioned that there was a problem discovered and fixed. This problem comes from pnfs-manager sending directory listing replies in (potentially) multiple messages, that (with the forthcoming dCache 2.9) SRM can now process these messages with different threads, and that this wasn't handled correctly. This is now fixed.

KIT

Xavier reported that dCache was running smoothly and no big issues.

He opened two new tickets:

RT#8303 reports that the 'dcache dump heap' command isn't working correctly for Java v7.

and

RT#8305 reports that the daily statistics file is identical to the hourly file, which is wrong.

He has also updated RT#8284 (statistics taking very long time), confirming Gerd's hunch that the problematic ATLAS pools contain more files. In fact, they contains orders of magnitude more files than the CMS pools. There, the problem seems to be simply a lack of scalability when gathering this information.

Paul will bring this up at next dev. meeting.

Xavier also mentioned that he is currently tidying up the monitoring and is interested in getting the info information as a JSON object. He will open a feature request in RT for this.

PIC

Marc reported that PIC had no problems this week.

He reported one ongoing issue: occasionally the admin service (ssh2) stops listening for incoming connections. They can work-around this problem by restarting the service's domain. Since the admin service runs in its own domain, restarting the domain no direct impact on the running system. The problem appears roughly once per month and nothing is logged at the same time. Marc is investigating, but will open a ticket if he draws a blank.

Marc also reported that they are continuing their tests for dCache v2.9 and Enstore. They hit a problem that required reinstalling the test system with dCache v2.2 and upgrading to dCache v2.9 from there. This upgrade is currently updating the database schema and Marc estimated it would take another few hours.

IN2P3

Yvan reported that IN2P3 is running smoothing and there's nothing critical to report.

He also said that they haven't yet got dCache accept kerberised NFS access. Adrian reported information to the support ticket and he is waiting on Tigran's reply.

Paul said that Tigran is away today, but he would ask him to look at the ticket when he's back.

Support tickets for discussion

[Items are added here automagically]

DTNM

The next meeting is Tuesday 6th May.