Table of Contents
[part of a series of meetings]
Participants
Gerd; Dmitry; Patrick, Tanja, Jan, Paul, Karsten
Agenda
[see box on the right-hand side]
Postcards
Up to two minutes (uninterrupted) per person where they can answer two questions:
- What I did last week (since the last meeting),
- What I plan to do in the next week.
No questions until we get through everyone :)
Patrick: recovered from WLCG and ...; making sure we're in UMD. In order to enter UMD one (or two) sites have to run the software into production for a while. Then send reports ...
Dmitry: PIC with encp .. cut a version of enstore on .. analysed crash-dumps of enstore. File aggregation project in enstore .. responsible for database infrastructure for a repacking operation (tar-ing up small files). Printed out the UML scheduler and requests, etc. This is to get Catalin's code changes into. Eclipse + Object-8 to reverse-engineer the diagrams.
Tanja reported that SRM clients JVM "seg-violation" just running multiple times against BestMan?. It looks like increasing memory available from 64M to 128M helps, but this strange as the error isn't out-of-memory. The is 1.6 (prehaps "-23").
The probe does: srmCp, srmLs, srmRm against different BestMan? SRM endpoints.
Gerd: two weeks at the SuperB accelerator, which goes online 2015. They're investigating how to deal with all their data. Playing around with adding an alternative to links. Rather than unit and unit-groups, we have arbitrary filters. It's not a complete replacement, just an alternative. The language is the same as the migration module: there's power '' regular-exp '~='. Last week at the WLCG workshop. Helping Shawn Mckey debug a problem FTP problem: there were two different cases: FTP door getting stuck (several different places), internal timeouts in dCache (pools? space-manager?). Once confirmed the first problem is fixed then we can collect information about the second problem.
Autobahn is a automatied bandwidth on heterogeneous network. Gerd is doing a software audit, along with others.
Tanja: WLCG .. off ill .. now working on tickets and patches.
Jan: WLCG .. patches for getting the web-admin.
Paul: pnfsDump
Karsten: work on making dCache work with IPv6; some patches are in RB.
Plans for patch-releases
Should we make a new patch release?
In trunk/1.9.13 we
1.9.12 rolled back to the old Jetty.
Changed something in Jetty 7.4 that would .. error handling is broken. 7.2 is fine. Closing SSL connections. Seem to have shuffled around responsibility
Gerd has a stack trace: Jetty gets stuck in a retry loop (read on a socket that's already disconnected), haven't been able to reproduce it.
Company: "it hit" "webdav system".
We will roll back Jetty version to 7.2 in 1.9.13, but keep 7.4 in trunk.
Trunk activity
Progress with new features...
Tanja is looking on changing the protocol used for pool-to-pool transfer from dCap protocol to use HTTP protocol. When will this happen? Not sure.
A while ago, Tigran asked if we could use the xrootd for pool-to-pool? Yes. But perhaps HTTP is the better protocol since we always want a complete file.
Issues from yesterday's Tier-1 meeting
Sara
75 threads blocked on doing VOMS extraction from the certificate. Plenty of threads available. It looked like the server was just processing lots and lots of requests (the extraction is a non-).
The problem went away after restarting the SRM. This is puzzling: either the system got into a state where the libraries entered a state where DN. extracting is slow, or something else is
Did he monitor the number of operations?
Yes, but not the
Perhaps restarting the SRM killed the subset of jobs that were causing the problem..
Contention-point was extracting the DN from the certificate.
Perhaps the reload of certificate cause the utility class to consume more CPU ..?
Unfortunately, he didn't collect a heap-dump from the SRM.
Could you have jhat running? The ticket has links to information; but there's no heap-dump.
The catalina.out has the interesting information: the stack-trace at the end of the file.
Issues from EMI
Very soon, the production of Nagios probes is due for dCache. We're also are taking responsibility for the generic SRM probe. DESY people to discuss this locally.
Outstanding RT Tickets
[This is an auto-generated item. Don't add items here directly]
RT 6463: feature request: statistics for pinned data in pools
Action plan (see ticket)
RT 6477: space cost calculation
Meeting to chat with Dima about this.
Review of RB requests
Check packaging so the upgrading from /opt to FHS (or visa versa) should at least give a warning and prevent damage.
Gerd likes the new layout.
DTNM
Same time, next week.
