wiki:developers-meeting-20110105
Last modified 10 years ago Last modified on 01/05/11 18:57:12

[part of a series of meetings]

Participants

Agenda

[see box on the right-hand side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

No questions until we get through everyone :)

Patrick: send lots of Christmas greetings; triggered CMS/xroot testing; possible Fermilab collaboration NFSv4.1 driver in SL-5; Michael Ernst will upgrade BNL Tier-1 January 17--18 to 1.9.5 -- no Chimera migration this year.

Antje: documentation, tickets, etc.

Paul: GDS OGF, bugs, tickets, emailing people,

Tigran: Trying to release 1.9.11 ... looks like the problem is getting the virtual machines running. Updated S2 tests to fix a bug (more strict of file sizes .. S2 bug), updated virtual machines, fixing cost module.

Karsten: still working gPlazma2 plugins, reduced to essentials parts and making nice progress.

Tanja: Bothering people with tickets. WebDAV on Windows and MacOS.

Gerd: fixed bugs in 1.9.11 (after upgrading production). Some time on init scripts and how we compile this to a POSIX shell function. Reimplemented "pool ls" command and the "services ls". Still to-do: query services running. Need to start working on the xrootd paper for CHEP.

Dmitry: before going on vacation: reassembed the test stand after h/w failure and repeated the SRM scaling tests and confirmed that PNFS is the bottleneck when scaling. This affects whether it makes sense to do the scalability work on having multiple SRM front-ends. Preparing a write-up on this work. Tried to look at the outstanding tickets. Now know what causes the "Records already exist" issue. It turns out it was caused by the switch useimplicitspacefornon-srmtransfers. The problem is triggered when this is switched off. Dmitry to submit a bug-fix soon.

Plans for patch-releases

Should we make a new patch release?

Waiting for 1.9.11 .. it's not released yet, but Tigran hope to release this on Friday.

When we are done with this, Tigran would like to release 1.9.5 and 1.9.10.

Trunk activity

Progress with new features...

Tigran has a patch that allows you to install two dCache RPMS concurrently.

Slightly related topic: can we make RPM relocatable? We would need a post-install script to do some 'magic'. So, probably not dCache RPMs relocatable.

Going golden

stuff...

Additional items should be added to the trac milestone:

  • new pin-manager,
  • SRM scalability,

gPlazma

At DESY we have a problem with GSI-dCap and GridFTP we getting different mappings. The problem is due to the order of mappings for .. This is with 1.9.10, but believed to be with all 1.9.x

Expectation is that go through 1st plugin first before trying the 2nd plugin.

Current behaviour is to take the first (DN, FQAN) pair and goes through all plugins; first match wins. If no plugin matches then the second (DN, FQAN) pair is tried, and so on.

The problem could be due to gsidcap now understands multiple roles, which is new behaviour. When the message contains a Subject field, we could send the list of FQANs.

Could be we're sending the chain for one protocol and the set of FQANs with the other.

Looks like gPlazma misbehaves here.

Performance issues with JVM

With Sun JVM 1.6.0_18 we (DESY) had lots of problems with performance; with 1.6.0_22 everything is solved.

Can we add a comment to the download page that the recommended JVM version is 1.6.0_22.

Issues from yesterday's Tier-1 meeting

PIC

update to 1.9.10-ish

Will PNFS work?

The only risk is we introduced a bug in BasicNamespaceProvider? between 1.9.5 to 1.9.10. Shoudl run the usual test process against PNFS instance.

Database schema changes?

Nope.

In one case we changed ORM (different trakcing table) and using Liquibase for tracking schema changes.

The last Chimera schema change was in 1.9.2.

PostGreSQL has been shown to work 1.9

They can test the downgrade procedure.

Do they use the berkley DB on the pools? If they do then that would prevent an easy downgrade. The table format is updated automatically on upgrade, which prevents a downgrade.

Staged upgrade: upgrade the headnodes first then when that works, upgrade the pools one-by-one.

For non-Berkley DB version: has storageinfo changed? No.

Triumf

Pools being overloaded and being suspended.

He has high IO-WAIT and checksum-on-write enabled.

Touching the file-system could cause the problem? This is done by a separate thread, so not causing the problem here. PoolUp? messages sends lots of information tries, to global lock on the repository

Suggestions:

turn off checksum-on-write and on checksum-on-transfer, buy more pool nodes, upgrade dCache to newer version (>= 1.9.10).

Issues from EMI

Nothing in particular.

Brian and Gerd

Brian has now starting testing X509-xrootd authentication against NDGF.

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

Review of RB requests

DTNM

Same time, next week.