Table of Contents
[part of a series of meetings]
Participants
Gerd; Dmitry, Al; Antje, Jan, Paul, Tigran, Christian, Tanja, Karsten
Agenda
[see box on the right-hand side]
Postcards
Up to two minutes (uninterrupted) per person where they can answer two questions:
- What I did last week (since the last meeting),
- What I plan to do in the next week.
No questions until we get through everyone :)
Gerd: "finishing" xrootd4j -- Java xrootd implementation independent of dCache; not released yet. Refactoring dCache to use external library + helping BNL.
Dmitry: looked at Al's patch. Looked at Chimera. Summerised test results. Tested migration of PNFS to Chimera. Encountered a couple of problems (a couple of tickets). Attended RT tickets on LHCb space-token. Created script for one site (adjusting reservation spaces, prob. close ticket). The other ticket involving migration of data between spaces .. hope to have something soon.
Al: Reposted patch this morning. Respond to suggestions made. Started looking at gPlazma code + doc. in anticipation of the GUMS plugin for gPlazma2.
Antje: documentation (SRM chapter)
Jan: web-admin tape transfer queue (restore handler page), selection setup + support etc.
Paul: BNL + xrootd + lyon
Tigran: release, release, release ... 1.9.12 (3x), 1.9.13 and 2.0.
Christian: finally got dCache server built with new build system in ETICS (by accident, due to bug in ETICS). Problem with repackaging jar files. Currently, submitted builds don't work but nightly builds do. Also have been udating wiki page. Still working on packing the Nagios SRM probes; this work is ongoing.
Tanja: Learning about ACLs (bothering Tigran about it) and working on tickets.
Karsten: Updating virtual machine with latest dCache (took a long time!). Have updated to dCache trunk + patches.
Plans for patch-releases
Should we make a new patch release?
Yes: there are fixes for WebDAV that BNL and Sara need. The WebDAV library (Milton) is buffering all data: the effect is that the content is stored in /tmp before sending the data. We need to release 1.9.12 (for BNL) and 2.0 (for Sara)
Trunk activity
Progress with new features...
Fun with maven
Gerd sent an email about building Trunk with OpenJDK: if you see the build time is very slow try use the Oracle version of JDK.
Tigran said it takes him 27 minutes to build (and the machine is unresponsive during this time).
Dmitry: what is the difference between the two versions? Shouldn't they be basically the same?
Gerd: it's unclear, but all the time is spent in the kernel.
Gerd: The packing phase seems most affected by the problem.
Tigran: I've seen the reverse: Building takes 27 minutes, but re-packaging (without recompiling) takes only 2 minutes. This is with OpenJDK.
What is packaging? This is collecting all the files and placing them in the RPM.
Gerd: also saw the same report with what Christian had reported. This suggests that ETICS is suffering from the same problem.
Gerd: With the Oracle JDK, it takes about 90s to do "ant cleanall server rpm".
Note that maven provides timing information. This should allow discovery of how long each step took.
Issues from yesterday's Tier-1 meeting
Triumf and FZK, Triumf happy, Doris reported that they are suffering from the same issues previously reported.
Since update to 1.9.12, FZK have had failing tests, although experiments do NOT observe these errors. See RT ticket #...
Does anyone know how our SRM client behaves with HTTP-KeepAlive?? Nope.
Paul: haven't checked if SRM client supports HTTP-KeepAlive?. They see EOF Exception: unclear if this is server or client.
How about taking tcpdump of the failures? This is problematic as they cannot reproduce it. They see msgs from Jetty about closing idle connections: this is expected (for clients that support HTTP KeepAlive?).
What do we need to decrypt the stream? The host's private key. Paul has a script for decrypting SRM streams using the host private key. We could send them this script and ask for the script's output.
We can try verify the theory without talking to KIT.
Possible explanation: tomcat always closed the connection; jetty allows HTTP KeepAlive?. The gfal/lcg_utils suite doesn't request HTTP KeepAlive?.
Paul: provide FZK with scripts for decrypting an SRM stream captured with tcpdump.
Issues from EMI
Nope.
Issues from OSG
Nope.
Dmitry noted that OSG doesn't distribute dCache anymore.
The OSG storage group: run by Tanja, has now dissolved. Dmitry didn't know how sites are to get support.
OSG are under review and are looking for more money right now.
They had a campaign to get rid of packman in favour of RPM-based install. Since dCache was already distributed via RPM, this didn't affect dCache.
Note that the Uni.Chicago is talking directly to dCache.org. This is probably the future model.
Minimum Java version
Minimum Java version for SRM.
We support Java v5. The reason was that OSG needed Java 5 support for the srm-client. Is this support still needed?
Dmitry thought the answer is "no", but will check.
jGlobus
There was a question from Maarten Litmaath on which software depends on jGolbus. This was triggered by IGTF are switching to SHA2-based certificates. Our jGlobus (and BestMaN) doesn't support SHA2.
We are currently use a patched version of jGlobus 1.8.x.
Gerd: the question is whether moving to jGobus 2 would break everything: does it support non-RFC proxies. We could make a patch for jGlobus code to add the SHA2 support, but this limitation could come from PureTLS: a very old implementation of TLS (== SSL). PureTLS comes from a time when Java didn't provide SSL support. jGlobus 2 uses Java's built-in support for SSL. The bad part of using Java's support is you are limited to whatever Java supports: e.g., Java doesn't support non-RFC proxies.
Timescale for this change is ~6 months. This doesn't give us much time to fix things.
Assuming that we can move to jGlobus 2 (i.e., proxy problems), what do we do with 1.9.12. Back-porting jGlobus-2 to 1.9.12 may be a big change, but it's less risky to back-port jGLobus rather than hacking Pure-TLS.
Need to organise our effort in supporting jGlobus 2. Part of this effect is to figure out if WLCG is ready for jGlobus-2. We can ask Maarten:
- whether WLCG still uses any non-RFC proxies?
- how quickly they can switch to using *only* RFC proxies?
- Are there any CAs that have email address in their DN?
- Are there any CAs that have email address in their signing policy files?
Tigran mentioned that there's already pressure against CAs using DNs with emails because VOMS servers don't understand the email attribute.
Gerd looked at the current ITGF CAs. He found a (Chinese?) CA that has an email address in their DN.
There were also a handful (4--5) of CAs that have email addresses in their signing_policy files.
Alternatives:
- fix jGlobus 1.8.x to support SHA2
- fix jGlobus 2 to support non-RFC proxies.
For now, we wait for Maaten's reply.
Tigran wondered what do they do in the C world? Don't suppose that they support non-RFC proxies in C and not in Java? Perhaps the old Globus library supports SHA2 (due to linking against OpenSSL).
Federated things
Paul presented a summary of the Federated Workshop.
Overall summary (from Andy) is available here:
http://indico.in2p3.fr/getFile.py/access?sessionId=3&resId=1&materialId=0&confId=5527
There was a surprising number of high-ranking people there:
Ian Bird, Ian Fisk, Michel Ernst, Brian Bockelman Hiro, Dirk Duellmann, Oliver Keeble, Graeme Steward, ...
ATLAS, CMS and ALICE were well represented. Noticeable absence was LHCb: there was nobody. The dCache talk went down well:
Nobody fell asleep (or fell out of their chair) Dirk asked if dCache wanted to join the "xrootd foundation" People asked about benchmarks SLAC xrootd vs xrootd4j
General xrootd4j interest:
People are very enthusiastic about plugin support ALICE interested in using rootd4j for their client (they're rewriting Alien in Java) Brian Bockelman interested (no time-scale, though)
A number of sites expressed interest in switching off their dCache + SLAC xrootd hybrid and moving to pure dCache implementation: motivated by (the promise of) namespace-mapping plugins.
BNL/Hiro interested in Checksums via xrootd (actually, complaining that checksums always trigger IO activity). It's unclear whether he was running a dCache xrootd and the client downloads the file to checksum it, or if he's running the SLAC xootd.
Hiro said that things are now fairly stable after the migration (yes, they migrated to Chimera). DNS issue is not yet resolved; they have a work-around, but haven't tested the resolving test-code from Gerd yet. Also, the WebDAV proxying issue (see Gerd's patch)
Other issues/thoughts:
Interested in seeing federated access as a self-healing mechanism: catalogue says file is at SE, SE doesn't have the data (possibly due to cat-SE-syn problem). Rather than updating the catalogue and failing the client, the SE would pull in the required data, using federation to find where it is available from.
People quite happy about using the federation for reading; unclear if writing into SEs happens through the federation or happens on some controlled, site-explicit fashion.
Nobody really interested in other solutions outside of xrootd (http was closest)
Next meeting in 1 year
Hudson Jenkins
What is wrong:
- unit tests are partially run
- failures for unit tests running against Expression language
- srm-client don't build
- don't update live /trunk server (sisyphus)
- need to test the branch/tag procedure: this probably should go through mvn release phase
Jan and Paul to fix these problems.
Outstanding RT Tickets
[This is an auto-generated item. Don't add items here directly]
Tanja wants to review all tickets and close those that should be: but needs help in deciding which ones are still active.
Perhaps we should do this ourselves.
Review of RB requests
Patch by Al: please follow and review.
DTNM
Same time, next week.
