wiki:tier-one-meeting-20181218
Last modified 12 months ago Last modified on 12/18/18 15:31:51

dCache Tier I meeting MONTH DATE, 2013

[part of a series of meetings]

Present

dCache.org(Paul), IN2P3(), Sara(), Triumf(), BNL(), NDGF(Jens), PIC(), KIT(Xavier), Fermi(), CERN()

Agenda

(see box on the other side)

Site reports

KIT

Looks rather fine.

Downtime on the 11th went fine. Finished before declared downtime completed.

Everything went smoothly. One pool node is missing IPv6 configuration -- now configured.

Nearline plugin

RT 9562

Jürgen says only need a small change to fix this.

Tested during the meeting, no error any more.

Samuel will update ticket.

Wrong heap space

RT ticket 9516

dCache domain restarted: wrong heap space limit

ssh

Tigran to remind Dmitry

High memory usage

(Out of memory issue in 4.2 too)

Surgey's case has different domain configuration, so this may be unrelated

Zookeeper problem

RT 9542

Suspend ticket for now; developers to look into the problem later on.

pin-manager

Could not update pin-manager due to concurrent updates.

HA setup

Two nodes with all core services. Now also want to distribute those services on distributed hardware.

Update documentation describing how to drain services in an HA setup.

Tell HAProxy not to send new connections to doors. Within dCache: "lb disable". Wait 24 hours for outstanding transfers to complete. Then restart dCache.

NDGF

Had two good weeks, but with one incident.

There was maintenance and restart of the router. One of the head nodes did not recover. The network went away. The head nodes "got angry". The pools reattached to one of the head nodes. Not all the pools reattached to the other head node.

The two hardware machines have a local interconnect (traffic avoids the router), but it's currently unknown whether currently ZK and dCache traffic goes over this link.

Support tickets for discussion

[Items are added here automagically]

DTNM

Same time, on 8th January 2019.