Changes between Initial Version and Version 1 of tier-one-meeting-20180213

02/13/18 15:43:22 (3 years ago)



  • tier-one-meeting-20180213

    v1 v1  
     2= dCache Tier I meeting February 13th, 2018 = 
     3[part of a [wiki:developers-meetings series of meetings]] 
     5== Present == 
     6, Tigran), IN2P3(), Sara(), Triumf(), BNL(), NDGF(Dmytro), PIC(), KIT(Xavier), Fermi(), CERN(),  
     9= Agenda = 
     11(see box on the other side) 
     13= Site reports = 
     15== KIT == 
     17Things are running fine again. 
     19=== ATLAS transfers failing === 
     21Last week there were problems with ATLAS transfers.  See RT ticket 9340. 
     23After some investigation it was determined that the transfers were failing because they were queuing. 
     25This was because the transfers were writing to tape and there is only two pools (one main pool and a smaller backup) that are tape-attached. 
     27The pool was configured to accept 100 concurrent writes, which meant that CUSTODIAL writes should see a different limit to REPLICA writes. 
     29FTS/ATLAS was trying to do too many transfers.  This is now fixed by ATLAS separating their disk and tape writes and imposing different limits. 
     31=== Where is the parallel transfers === 
     33RT 9344 
     35Xavier would like to know the number of streams involved with a transfer.  This is to investigate whether the number of streams affects the transfer performance. 
     37We could add this as logging to protocol-info part in billing. 
     39=== Accommodation for workshop === 
     41Will there be rooms reserved in the DESY hostel? 
     43No, but you can reserve directly at DESY hostel. 
     45=== Using HA at DESY === 
     47Yes.  we have HA already deployed for some time for cloud instance. 
     49We are upgrading XFEL instance next week and will switch on HA support there. 
     51=== CA problems === 
     53Last year there was a problem with the UK CA changing their intermediate CA certificate from SHA-1 to SHA-256. 
     55Is this now resolved?  Yes upstream, but the fix is not yet in dCache. 
     57Xavier is happy for ticket RT 9310 to be closed. 
     59== NDGF == 
     61Going fine! 
     63=== Missing cell === 
     65dCache ssh admin interface claimed that a cell hosted on a pool under heavy load, did not exist. 
     67Yes, this can happen as the admin cell sends a message to verify that the cell exists when the '\c' command is issued. 
     69The cell still exists in \l output because that information is cached. 
     71We can look at changing how \c command checks for the existence of a cell. 
     73== BNL == 
     75Jane reported things are going fine. 
     77=== CLOSE_WAIT === 
     79Jane described the problem they saw at BNL: the door with many TCP connections (to pools) in CLOSE_WAIT state. 
     81There seems to be two problems here: 
     82  1. transfer didn't succeed. 
     83  2. wasn't cleaned up correctly when the client disconnected. 
     85Thanks to the information provided by Jane, we believe there is enough information to reproduce the problem. 
     87=== Draining the GridFTP door === 
     89Jane reported that she has had difficulty draining the GridFTP door.  This is with dCache v3.0 and seems to be a regression against their earlier version. 
     91Paul asked Jane to open a ticket describing the problem. 
     93=== Released 4.0 version === 
     95BNL is currently running 3.0.  To which version should they upgrade?  What will be "production ready" in ~6 months time. 
     97All dCache releases are (to the best of our knowledge) production ready.  When upgrading at DESY, we always take the latest version.  For example, next week's upgrade is to 4.0.  If we upgrade after that then it should be to dCache v4.1 
     99Jane described how BNL has a test bed instance, exactly like their production, with which they can test dCache releases. 
     101Paul: This should give you additional confidence that, whichever version you take will at least work.  There is always the possibility of performance related regressions --- ones that are only visible when deployed in production; however, those are relatively unlikely. 
     103= Support tickets for discussion = 
     105[Items are added here automagically] 
     107= DTNM = 
     109Same time, next week.