wiki:developers-meeting-20070822
Last modified 14 years ago Last modified on 08/22/07 15:49:37

dCache 1.8 deployment meeting

Brief remark

Yesterday Flavia reported on the dCache 1.8 deployment efforts at the LCG Management Board. Listening to her presentation, we learned that DPM, CASTOR and the LCG tools are all fine, just dCache prevents LHCb from doing their tests. Consequently the Management Board is very concerned (about dCache of course) and ....

Agenda

Therefore I would like to discuss on how we will proceed and now much effort I can expect from the various developer groups in order to get LHCb 'ready' this week

Related issue : Travel to FZK in October, any news on dates?

Beside this we need to follow up on the unresolved issues from the BNL meeting :

  • Minor issues : 'reserve' command
    • Lifetime in millis instead of seconds
    • Can an infinit lifetime be specified
  • Issues which lets FTP stuck
    • NOT_EMPTY return code for RMDIR
    • DUP return code for MKDIR
    • There is the 'ls' problem reported by IN2P3, do we have any idea on that ?
    • space token not returned by srmGetSpaceTokens at IN2P3.
  • New Development Issues :
    • Default AL/RP from pnfs. getStorageInfo gets those values from directory tags if file doesn't exists.
    • linkGroups can serve "Castodial, Output, Replica" at the same time. The consequence is that a single pool (if connected to an HSM) can hold those types as well. Pools which are not connected to an HSM can of course still hold "Output, Replica"
    • Status of the P2P checksumming in 1.7 and 1.8
  • More :
    • Most recent SRMCP (v2) client doesn't copy and more.
    • Trying to read a directory (the directory itself, not a file in the directory) via GridFTP results in hanging transfers? Apparently there is no check in the FTP door for this (unless one uses encp). The transfer ends up being suspended in the PoolManager, since the PoolManager cannot find the file anywhere (obviously, since it is a directory and not a file). One has to manually do 'rc destroy' on the request. Using srmcp does not result in a hanging transfer, but the error message and return code is quite strange - and srmcp apparently things it is a transient error and tries to copy the file over and over again. Not optimal.
    • dccp -P (Prestage) doesn't work. (May be still because of the protocol unit)

Regular Operation Issues

  • We see : org.dcache.srm.scheduler.NonFatalJobFailure: org.dcache.srm.SRMException: TransferManager errortoo many transfers!

remoteGsiftpMaxTransfers=srmCopyReqThreadPoolSize=250

  • Certificate rejected : running gPlazma module and cell.java.lang.Exception: GSSException: No valid credentials provided
  • File Descriptor leak in pools (both read and write)
  • Vladimir has observed a verifiable memory leak in kerberos dcap doors in Fermilab, Vladimir will fill in the details


Last modified by Patrick at Wed Mar 3 09:42:42 2021