Table of Contents
dCache Tier I meeting MONTH DATE, 2013
[part of a series of meetings]
Present
dCache.org(Paul), IN2P3(), Sara(), Triumf(), BNL(), NDGF(Jens), PIC(), KIT(Xavier), Fermi(), CERN()
Agenda
(see box on the other side)
Site reports
KIT
Looks rather fine.
Downtime on the 11th went fine. Finished before declared downtime completed.
Everything went smoothly. One pool node is missing IPv6 configuration -- now configured.
Nearline plugin
RT 9562
Jürgen says only need a small change to fix this.
Tested during the meeting, no error any more.
Samuel will update ticket.
Wrong heap space
RT ticket 9516
dCache domain restarted: wrong heap space limit
ssh
Tigran to remind Dmitry
High memory usage
(Out of memory issue in 4.2 too)
Surgey's case has different domain configuration, so this may be unrelated
Zookeeper problem
RT 9542
Suspend ticket for now; developers to look into the problem later on.
pin-manager
Could not update pin-manager due to concurrent updates.
HA setup
Two nodes with all core services. Now also want to distribute those services on distributed hardware.
Update documentation describing how to drain services in an HA setup.
Tell HAProxy not to send new connections to doors. Within dCache: "lb disable". Wait 24 hours for outstanding transfers to complete. Then restart dCache.
NDGF
Had two good weeks, but with one incident.
There was maintenance and restart of the router. One of the head nodes did not recover. The network went away. The head nodes "got angry". The pools reattached to one of the head nodes. Not all the pools reattached to the other head node.
The two hardware machines have a local interconnect (traffic avoids the router), but it's currently unknown whether currently ZK and dCache traffic goes over this link.
Support tickets for discussion
[Items are added here automagically]
DTNM
Same time, on 8th January 2019.