Changes between Initial Version and Version 1 of tier-one-meeting-20170801


Ignore:
Timestamp:
08/01/17 15:46:51 (3 years ago)
Author:
paul
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • tier-one-meeting-20170801

    v1 v1  
     1[[TOC(depth=0)]] 
     2= dCache Tier I meeting August 1st, 2017 = 
     3[part of a [wiki:developers-meetings series of meetings]] 
     4 
     5== Present == 
     6 
     7dCache.org(Paul), IN2P3(), Sara(), Triumf(), BNL(), NDGF(Ulf), PIC(Marc), KIT(Xavier), Fermi(), CERN() 
     8 
     9= Agenda = 
     10 
     11(see box on the other side) 
     12 
     13= Site reports = 
     14 
     15== KIT == 
     16 
     17Things are running just run for dCache. 
     18 
     19=== space-manager namespace consistencies === 
     20 
     21Notification:  several months ago Gerd sent a guide on how to space-manager table.  Looking for inconsistencies. 
     22 
     23Samuel did this on a test inconsistencies -- found some! 
     24 
     25Easy things can be fixed  
     26 
     27Files have the wrong size in space-manager table. 
     28 
     29Incoming ticket on how to update database to match. 
     30 
     31File sizes of 1 byte. 
     32 
     33Normal file size in space-manager, but zero-length in Chimera.  This could be a temporary issue -- still checking. 
     34 
     3550,000 files in space-manager that are not in the namespace. 
     36 
     374,000 files were flushed to tape, but space-manager did not know about them. 
     38 
     39(compared to more than 3,000,000 files; these numbers are fairly negligible) 
     40 
     41== NDGF == 
     42 
     43Installing updates.  Nothing bigger happening  
     44 
     45email from last week, GSI-dcap. 
     46 
     47Jolic Pinjac; asking about dCache v3.0.10 onwards as this version has a security vulnerability. 
     48 
     49== PIC == 
     50 
     51Some issues this past week -- two main problems. 
     52 
     53=== Overloaded CMS pools === 
     54 
     55First, overload of the pool -- caused by CMS, doing many activities 
     56 
     57Pools get overloaded -- too much IO. 
     58 
     59Failing some tests --  
     60 
     61Reduced number of movers per pool. 
     62 
     63Tests fail due to being queued. 
     64 
     65Some pools have ~200 TiB. 
     66 
     67Once every three months, try to rebalance the pools. 
     68 
     69Sometimes pool become "hot".  Have limited the number of 1,500 xrootd movers. 
     70 
     71Have p2p replication enabled -- "soft mode". 
     72 
     73Purchase new hardware in 2 months, bringing PIC up to pledges. 
     74 
     75CMS is reading files directly, whereas ATLAS is copying the complete file to the worker-node. 
     76 
     77KIT is running GPFS (840 TiB pool, 7 pools on 7 servers; peak 220 GiB/s); PIC is running NAS servers 
     78 
     79Problem is seem with old hardware, which is being decommissioned currently. 
     80 
     81Suggest protocol-specific-links to avoid random-IO hitting the old hardware pools. 
     82 
     83=== Chimera problems === 
     84 
     85Problems with Chimera connections. 
     86 
     87Several NFS connections through different NFS doors; reached maximum number of connections that PostgreSQL -- was set to 10,000. 
     88 
     8990-nproc.conf -- soft limit of 1,000.  Increase this to 10,000.. 
     90 
     91Have 7 nfs doors -- 6 doors for NFS 4.1; 1 NFS door for enstore on pools. 
     92 
     93Default is 30 per door; have increased it to 300. 
     94 
     95Not clear why this was increased. 
     96 
     97There was a VO that was running ~1,000 requests concurrently through NFS, and there were problems. 
     98 
     99= Support tickets for discussion = 
     100 
     101[Items are added here automagically] 
     102 
     103= DTNM = 
     104 
     105Same time, next week.