wiki:developers-meeting-20080414
Last modified 13 years ago Last modified on 02/20/08 21:50:49

dCache 'all hands' Developers meeting at DESY 14 - 18 April 2008

Goals

  • On basis of the most recent production experience, including the CCRC08 results, Identify the weak areas of dCache architecture and come up with the ways to improve them.
  • Identify the areas where system stability is achieved by administrators by using external scripts and implement these as internal dCache services
  • Define a plan to achieve the WLCG MOU (V1 and V2) functionality and achieve the functionality needed by WLCG experiments in time for the commencement of the LHC experiments.
  • Come up with short, mid and long term development plans
  • Review the LHC experiments support model, document the information flow, suggest to WLCG and national Grids on how to improve the model to make it more scalable.

Presentations

  • New Information Provider
  • Design of the new pool, which is needed to support xrootd and nfs4.1 semantics.

(Unordered) List of Discussion topics

  • dCache messaging, alternatives to Cells.
    1. JMS. Example of Art's approach to SRM Scalability
    2. ???
  • Reliable Registration of the pool side replicas in the pnfs/companion.
  • Automatic problem detection
    1. Reliable deletion of the replicas from the pools upon file deletions
    2. Automatic Scans of pools for Corrupt files, missing files, best strategy for detection of data loss
    3. PNFS Scans
    4. One of the biggest complaints is that sometimes if things fail, there is no easy way to tell why, and how to address the issues. We need better diagnostics (error reporting) of the failed transfers, improvements in logging facilities, admin tools. Need to discuss the steps and concrete activities to improve in this area.
  • SRM MOU (V2) and its consequences
  • SRM Space Management
    1. On Write “Online” files are pinned in the write space even if there exists a separate read space to which files are P2Ped and the design should be changed so that the files are in the later case unpinned in the write space and space manager is updated correctly. Management of the read spaces would be needed otherwise we might run out of the read pool space.
    2. How to implement of the changeSpaceForFiles functionality which again would involve changes to both pool and PoolManager code and space manager behavior, as well as possible copying of files between pool groups.
    3. Experiments want to control which read pools files go into again using the space tokens given to SrmBringOnline? and srmPrepareToGet. Even if we do not immediately work on the implementation of the space reservation for read spaces, we need to better understand all the design decisions, have the questions that I have raised answered and get a good estimate of the effort it would take to implement. Then we could realistically decide if it is practical to implement. I propose that Dmitry and Tigran work out some of the issues before the workshop and present the results at the workshop.
    4. How do we implement Access Control of the space tokens.
    5. Combining the functionality of the space manager and resilient manager, needed by European sites running both together and OSG. Again I propose that some preliminary work is performed before the workshop and we come to the workshop with a concrete proposal.
  • Configuration
    1. How to simplify configuration
    2. How to make dCache perform configuration consistency checks
    3. How to detect and report potential configuration issues.
    4. Simplified configuration changes, tools

Agenda

Monday Apr 14

Tuesday Apr 15

Wednesday Apr 16

Thursday Apr 17

Friday Apr 18