wiki:tier-one-meeting-20181120
Last modified 13 months ago Last modified on 11/20/18 14:52:51

dCache Tier I meeting November 20, 2018

[part of a series of meetings]

Present

dCache.org(), IN2P3(), Sara(), Triumf(), BNL(), NDGF(Jens), PIC(), KIT(Xavier), Fermi(), CERN()

Agenda

(see box on the other side)

Site reports

KIT

Looks rather fine.

Final downtime for updating dCache for LHCb to dCache v3.2.

New problem:

Reported on Wednesday via RT ticket 9545.

Linked to the httpd domain runs out of memory -- the pool manager domain crashed with out-of-memory. Does restarts default limit for heap space, rather than the configured limit.

httpd domain

httpd domain still needs restarting (every 25--26 hours).

This is described in RT Ticket 9537.

Paul to clarify who is investigating this issue.

ssh interface

Dmitry looking into ssh ticket, but hasn't done so.

Paul to talk with Dmitry to remind him he promised to update the ticket.

Lost 'well-known' status

Described in RT ticket 9542.

A couple of pools lost their 'well-known' status in the dCache network.

It seems to be related

One of the ZK server dropped out of the

One of the pool manager crashed in ATLAS instance.

Xavier will update ticket with information about networking problem for ZK node, resulting ATLAS pool-manager crash, etc.

NDGF

Things are looking pretty good.

Continued investigation for a number of issue.

Two physical machines -- swapped the VMs on the physical machines and the anomalously high memory usage problem went away. Both VMs now using the same amount of memory.

NDGF will probably upgrade to the latest dCache v4.2 tomorrow.

Support tickets for discussion

[Items are added here automagically]

DTNM

Proposed: same time, next week.