wiki:developers-meeting-20070919
Last modified 13 years ago Last modified on 09/21/07 17:11:42

D-Day - 33

Developers Meeting Sep 19 2007

Present

Expected : Dmitry, Gerd, Martin, Patrick, Timur, Andrew, Ted, Vladimir, Tigran


Synchronizing on CDF Problems

  • Restore to same host in case of pool exceptions : fixed in 1.7
  • Disk Space Problem.
    • Wrapper around movers and p2p client.
    • Could Jon upgrade some of the pools at any time ?
  • Mover pipe problems : solved.
  • Mover memory seems to be solved for now.
  • Pool JVM takes a lot of memory : solved at least for now.
  • grid ftp Door memory problems at FZK : still not solved
  • Alex : Unexpected message class 9 class in pool manager (disturbing) : decided to rephrase the error but not remove it.
  • Alex : exception in p2p : those issues are triggered by port scans. Still investigation on how to avoid that.

The next 1.7 release can be installed for CDF during the power outage planed for Wednesday, October 3rd. We should have a release ready 7 days in advance to allow for serious testing. Alex will install the new release beginning of next week on the CDF test stand.

Currently/Next? working on

  • Random 'root' of newly created directories.
  • Cleaner seems to fail sometimes. Patrick send more information on how to configure the cleaner to Tanya. On advisoryDelete, the SRM does :
    • change persistancy
    • remove flags
    • remove name
  • gridKa door out of memory.

SRM 2.2

  • REPLICA ONLINE : Timur added space conservation. Is running at DESY.
  • Any news on internal stress tests ?
  • How about remaining 'Flavia' issues ?
    • srmPrepareToPut returning SRM_AUTHORIZATION_FAILURE instead of SRM_AUTHENTICATION_FAILURE
    • rmStatusOfGetRequest changes over time
  • Estimation on site readiness. What is needed to get them all green.
  • Which development efforts are in the queue for the 1.8 d-day (see next section)

Next milestone for Jamies meeting (this is extemly important)

  • All return codes must be ok
  • All sites (NDGF/In2p3/BNL/gridKa/SARA) must be correctly configured

Issues (not ordered)

This is an unsorted list of open questions. Some of them are important for the 1.8 release, other might not be that important.

  • Status of the Space Management documentation (Gerd, Flavia, Timur).
  • Cleaner seems to fail sometimes which causes
    • files stay on disk forever
    • space is not reclaimed to space manager
  • Random 'root' of newly created directories.
  • We still have problems with door memory in 1.7. Do we observe this in 1.8 or is the code better ?
  • Pools at the CMSUS Tier I run out of the defined disk space. The 1.8 code has much improved though we don't now for sure if the problem still exits.
  • The SRM return codes are still an issue.
  • Can the AL and RP now be taken from the destination directory ? They are already in the StorageInfo.
  • linkGroup changes
    • VO stuff moves into the SpaceManger
    • custodial, replica ... is added as boolean value to the PoolManager.conf
  • The first Retry of a failing 'restore' must not go to the same host. Has to be ported from 1.7.
  • Cache Repository control files should go into a DB. We may start with non customized serialization.
  • Multi HSM support : it is unclear what to do if the message, which contains the URI, is not accepted by the PnfsManager. (E.g. PnfsManager is down).
  • Cleaner race condition if a ftp transfer is interrupted. File stays on disk. (Pnfs Entry is removed to early). Gerd implemented a workaround, which we need to port to 1.8.
  • What is the status of the p2p reimplementation.
  • Role & Groups managed by gPlazma. This should be sufficiently covered by the new chapter it the Book.
  • gPlazma still has problems when used as a cell. Is there any gPlazma stress testing done?
  • Is Flavias stress testing using gPlazma or the kpwd file ?
  • xrootd : multi stream support. From a certain xrootd version on, the current implementation will break.
  • Unifying StorageInfo -> read old write new : This will be postponed
  • srm-ls still has problems at IN2P3. 2 Files written, only one is in the 'ls' listing.

FERMI development plan for the initial production version

  • Resolve urgent issues
  • Error codes
  • SpaceManager (3 days)
  • Configuration documentation
  • PinManager v2 (5 days) could be postponed


Last modified by Patrick @ Wed Mar 3 08:59:35 2021