wiki:developers-meeting-20090617
Last modified 12 years ago Last modified on 06/18/09 14:53:35

Participants

Patrick, Owen, Paul; Gerd, Marco; Timur, Vladimir, Dmitry

Agenda

(see Table of Contents)

Status of 1.9.0 release

What is the current status of the v1.9.0 release branch?

Gerd: I consider this unmaintained for past two months.

Removed recommended status of 1.9.0 (not green).

Everyone agreed that it is "end of life". End of life means we don't anticipate any new releases, except for fixes for security or data corruption where sites are unable to migrate to a later version.

Status of 1.9.1 release

Fixes include Message Ids from pools, support for double forward-slashes in xrootd door; the issue with pools throwing away client-supplied checksums (unless...)

We consider this new release pretty urgent.

Owen intends to update the CERN release once the next 1.9.1 version is released.

We announce the end-of-life for 1.9.1 when 1.9.3 is released. The end-of-life will be one month after 1.9.3 release date.

Plan to release next version of 1.9.1 by end of the week.

Status of 1.9.2 release

Same patch-fixes as with v1.9.1 branch, plus some more patches related to the performance regression.

Owen asked about SCAS/XML with v1.9.2. Timur said he wasn't sure if SCAS is being used in production.

Timur asked about the other performance patches. Gerd replied that the only performance patch not in v1.9.2 is Tigran's mega-message patch and his info-extractor re-factoring patch.

To notice these performance increases one needs to upgrade only the PnfsManager to the latest version ... except for the throwing-away client-supplied checksum, which would require upgrades to the pool.

It was agreed that Jon's system will use a branched version of v1.9.2 with the PnfsManager storing AL/RP information switched off. Timur will cut this branch from v1.9.2 branch.

Paul's patch for PnfsManager to prevent level-2 lookup when querying file size is to go into v1.9.2.

Anticipate releasing 1.9.2 by early next week.

Status of 1.9.3 release

The only outstanding issue is Tigran's mega-message patch. This currently isn't in ReviewBoard?.

We need to be sure that we can run the old (== v1.9.2) pools with the new core components.

Gerd: Pretty sure if that new doors will not set AL/RP, but rely on pools doing this. If the pools are old/v1.9.2, then they will not set AL/RP either, resulting in information being lost.

Two things are pressing:

  • Support for ACLs; although the limitations on this for some sites means that it may not be useful for all sites.
  • ??

We will need a 1.9.4 release to fix these issues.

Since Tigran is unable to attend, we should contact him to ask that he submits the patch to ReviewBoard? as soon as possible.

Q: How about releasing 1.9.3 without the mega-message patch?

Timur commented that they are planning on deploying Berkeley-backed PNFS tomorrow. They anticipate seeing sufficient performance improvements that the pressue to upgrade will be alleviated.

There's the info extractor restructuring to use a common class hierarchy.

How about releasing v1.9.3 now, without mega-msg, and aim to release v1.9.4 with the patch in.

Patrick: we should have tape protection in 1.9.4.

How about planning to release 1.9.3 on Wednesday next week, come what may?

Timur wanted to include a new feature that he's keen on getting in. This is the use of Terracotta to provide a scalable SRM. This does require some code changes, but is limited to change from Java Persistence Framework to TerraCotta? Re-entrant Read/Write?

Final decision: we talk about this next meeting (Wednesday) and release Thursday morning.

AP/ Patrick: come up with deadlines for next releases.

Issues from Tier-1 meeting

SRM client and SRMv2

Can sys-admin's adjust SRM clients so they default to using SRMv2 (e.g., environment variable).

No. It is currently not possible to alter which version of the SRM specification is used via an environment variable.

Owen: can we have new installs defaulting to SRM v2 ?

We can do this but it might break clients that are expecting that default behaviour is SRM v1; for example, clients that have scripted their use of srm and assume that SRMv1 is used will break.

Are there any known services that are using SRM v1 end-points?

Fermi to think about it.

Try it and see.

Patrick was keen to talk to someone in the communities using dCache before adjusting the default SRM version. For OSG, perhaps this could be Brian Bockelman?

AP/ Patrick: contact Brian Bockelman and ask about switching default SRM version for SRM clients from v1 to v2.

BNL pinning issue

From the email Pedro sent:

The only issue we have to report is that we have some files
which are on disk only areas but the SRM takes too long to
answer and eventually the client times out. we've traced this
to the fact that pinning (I repeat pinning not the retrieval
of the pnfsid+storageinfo from pnfs) is taking too long

Timur indicated that this might be related to a well-known bug in pin manager; if a file is removed then it can take ages.

AP/ Timur: to take control of investigating the BNL pinning-taking-too-long issue, contacting Pedro as necessary.

Jon, tapes and restores

RT 4547

Patrick asked if Jon was still experiencing the issue with too many restores; the answer was "yes"

AP/ Patrick: to take control of investigating Jon's multiple tape remount issue, contacting Jon/Fermi? as necessary.

SRM: Overwrite option on the prepareToPut

Gerd asked about the overwrite option in the SRM prepareToPut command. Is it supposed to work?

Timur said "yes." It must be enabled both in the SRM and in the GridFTP door. When the overwriteEnabled option is enabled then the SRM will not reject requests to upload to an existing file, provided the client includes the overwrite option. If the client fails to specify the overwrite option then the server will reject the request, as before. There is an option in SRM called overwriteByDefault that, when enabled, informs SRM to allow all write operations, irrespective of whether the end-user specifies the overwrite option.

The actual file deletion is done by the GridFTP door.

Gerd indicated where the desire is coming from: arc client currently does an SRM ls operation to establish whether the file already exists and remove it prior to uploading. If the overwrite option is available and used then the SRM ls will not be needed: the desired behaviour can be achieved by specifying (or not) the overwrite flag.

dcap client for Solaris

Timur indicated that the Solaris client published on website is old and incompatible with current dcap door. This is an urgent problem for Fermi.

They have a database group that run from behind a firewall. They can only use the Kerberised FTP client in passive mode: which places great strain on the FTP doors. Using the dcap protocol, with Kerberos-based authentication, it would be possible for the clients to talk directly to the pools, so alleviating the load on the pools.

The currently available download option is 2.5 -- 3 years old.

A tar-ball release would be appreciated by Timur.

AP/ Owen: to look into compiling kerberos-based dcap library on Solaris for Timur.

linking to VDT / Nordugrid

Owen described how Nordugrid's Mattias Ellert gave a very nice talk about his efforts in packaging Globus into Debian/Ubuntu?, RedHat? and Fedora. In the process of looking at supporting IA-64 platforms, dcap and gsi plugin Owen descovered it was trivial to achieve linking, provided we dynamically link against the libraries.

We currently package the dcap libraries that have been statically compiled against our own (patched) version of Globus. Owen proposes we move over to using either VDT or Nordugrid packages and dynamically linking against them.

Timur mentioned his favour of static linking: we then don't depend on anything. Patrick agreed, citing problems in the past with ROOT. Gerd agreed we would still have to provide a statically link version of the dcap library.

Discussion then moved into the topic of getting the dcap libraries into distributions. There's no guarantees, but we can package the libraries so that it would be more easy to adopt.

A prerequisite for this would be to sort out the license. Discussed options are: GPL, LGPL, Apache, MPL, BSD. Patrick agreed to look into the license issue.

AP/ Patrick: Look at which F/OSS license the dcap library might be distributed under. AP/ Gerd: Talk to Ellert to see if he's happy to help get dcap library into the distributions.

Owen described that, in the mean time, VDT have been asked to fix their libraries so we can link against them.

Open tickets

RT#4584 PnfsManager crashing

After some initial confusion, it seems clear that the issue here is with PnfsManager becoming stuck when given a request for a ID where the database-ID is non-existent. At least one of the initial reports of domain being restarted was due to manual intervention in recovering from the problem: this is for the one in the submitted log file.

If this is true then the issue is with PnfsManager message thread blocking on file-system activity. The problem here is in PNFS, rather than PnfsManager and the long-term solution may be difficult.

Someone at DESY should try to reproduce the PnfsManager problem on a virtual machine.

Review of outstanding review requests

Due to the late running of the meeting, this item was skipped.

AOCB

Gerd mentioned that, since we're going to have a 1.9.3 soon, Gerd needs information for the release notes. He's already asked for this, so this is your reminder!

DTNM

"same time, next week" Wednesday 24th June 2009.

Summary of AP

  • AP/ Patrick: come up with deadlines for next releases.
  • AP/ Patrick: contact Brian Bockelman and ask about switching default SRM version for SRM clients from v1 to v2.
  • AP/ Timur: to take control of investigating the BNL pinning-taking-too-long issue, contacting Pedro as necessary.
  • AP/ Patrick: to take control of investigating Jon's multiple tape remount issue, contacting Jon/Fermi? as necessary.
  • AP/ Owen: to look into compiling kerberos-based dcap library on Solaris for Timur.
  • AP/ Patrick: Look at which F/OSS license the dcap library might be distributed under.
  • AP/ Gerd: Talk to Ellert to see if he's happy to help get dcap library into the distributions.