Developers meeting at CHEP in victoria

Main Topics

General Issues (mainly CDF and CMS)

  • The problem is triggered on a pool node. It runs out of memory or out of file descriptors. The file descriptor problem is understood. It turned out the 'run system' call opens the 'external process input stream' although we are not asking for it. And consequently we don't close it afterwards. So there is one open fd per external process launch. The memory problem may or may not be related to the issue.
  • The pool manager should be fixed in two phases :
    • If a pool reports an error on restore, we immediate retry the request on a different pool. We need to make sure that the new pool is not on the same physical machine as the initially failing pool.
    • It would be good if we would have a statistics on pool failures which would, after a certain time, disable the failing pool.
  • Memory on pool nodes
  • Memory on door nodes (less important)
  • File descriptor limit problem on the pool nodes
  • Black hole problem.
  • dccp -P -T doesn't work (low priority, not yet verified, need to wait)
  • Restore doesn't work for files larger 2G for files not register in pnfs. Timur modified enstoreExtracter (but not checked in) but thinks pool logic need to be revised.

Read only doesn't work for Jon and Bockelmann

This is absolutely not clear. The code seems to exclude pools which have the readOnly bit set on. Will try this with one of the test systems.

Topics from the SRM deployement

  • Asking for REPLICA getting CUSTODIAL in dCache
  • files appearing as user root when using the srm

