Changes between Initial Version and Version 1 of tier-one-meeting-20181218


Ignore:
Timestamp:
12/18/18 15:31:51 (20 months ago)
Author:
paul
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • tier-one-meeting-20181218

    v1 v1  
     1[[TOC(depth=0)]] 
     2= dCache Tier I meeting MONTH DATE, 2013 = 
     3[part of a [wiki:developers-meetings series of meetings]] 
     4 
     5== Present == 
     6 
     7dCache.org(Paul), IN2P3(), Sara(), Triumf(), BNL(), NDGF(Jens), PIC(), KIT(Xavier), Fermi(), CERN() 
     8 
     9= Agenda = 
     10 
     11(see box on the other side) 
     12 
     13= Site reports = 
     14 
     15== KIT == 
     16 
     17Looks rather fine. 
     18 
     19Downtime on the 11th went fine.  Finished before declared downtime completed. 
     20 
     21Everything went smoothly.  One pool node is missing IPv6 configuration -- now configured. 
     22 
     23=== Nearline plugin === 
     24 
     25RT 9562 
     26 
     27Jürgen says only need a small change to fix this. 
     28 
     29Tested during the meeting, no error any more. 
     30 
     31Samuel will update ticket. 
     32 
     33=== Wrong heap space === 
     34 
     35RT ticket 9516 
     36 
     37dCache domain restarted: wrong heap space limit 
     38 
     39=== ssh === 
     40 
     41Tigran to remind Dmitry 
     42 
     43=== High memory usage === 
     44 
     45(Out of memory issue in 4.2 too) 
     46 
     47Surgey's case has different domain configuration, so this may be unrelated 
     48 
     49=== Zookeeper problem === 
     50 
     51RT 9542 
     52 
     53Suspend ticket for now; developers to look into the problem later on. 
     54 
     55=== pin-manager === 
     56 
     57Could not update pin-manager due to concurrent updates. 
     58 
     59=== HA setup === 
     60 
     61Two nodes with all core services.  Now also want to distribute those services on distributed hardware. 
     62 
     63Update documentation describing how to drain services in an HA setup. 
     64 
     65Tell HAProxy not to send new connections to doors.  Within dCache: "lb disable".  Wait 24 hours for outstanding transfers to complete.  Then restart dCache. 
     66 
     67 
     68== NDGF == 
     69 
     70Had two good weeks, but with one incident. 
     71 
     72There was maintenance and restart of the router.  One of the head nodes did not recover.  The network went away.  The head nodes "got angry".  The pools reattached to one of the head nodes.  Not all the pools reattached to the other head node. 
     73 
     74The two hardware machines have a local interconnect (traffic avoids the router), but it's currently unknown whether currently ZK and dCache traffic goes over this link. 
     75 
     76= Support tickets for discussion = 
     77 
     78[Items are added here automagically] 
     79 
     80= DTNM = 
     81 
     82Same time, on 8th January 2019. 
     83