Table of Contents
[part of a series of meetings]
Participants
Gerd, Per, Dmitry, Gene; Christian, Paul, Tigran, Karsten, Tanja
Agenda
[see box on the right-hand side]
Postcards
Up to two minutes (uninterrupted) per person where they can answer two questions:
- What I did last week (since the last meeting),
- What I plan to do in the next week.
No questions until we get through everyone :)
Tanja: NFS work, tickets, gsidcap & gPlazma. Karsten: vo-role mapping plugin for gPlazma2. Tigran: merges, patch reviews; file-channel decorator that publishes disk performance with JMX. Paul: P Christian: separating SSH2 and admin patches; minimum EMI documentation, EMI meetings, learning more from Owen; EMI survey on testbeds. Dmitry: committed HSM-enstore support patch; merge request for 1.9.10..1.9.5. Ask Catalin for trigger in table get by Id. Testing scalable SRM with PNFS to see if PNFS will be a bottleneck; talked to Lies about CMS multicore project. Mail from Jon that they've applied Brian's patch. Gene: ... Gerd: continued working on pin-manager; design is now cleaner. Deployed Thomas' new xrootd stuff (not really GSI). Performance was disappointing, Thomas fixed this. Per: Started working on automatic background checking of file integrity.
Plans for patch-releases
Should we make a new patch release?
Promise to release things. Didn't happen (busy merging) but will do soon.
We need to make a 1.9.11 branch: tomorrow!
Two meetings ago, asking about minimum version.
We found that 1.9.10 didn't work with 1.6.0_16; upgrading to 1.6.0_22 fixed the problem.
At some point you get out-of-memory; GridFTP doors socket adapter in E-mode.
Trunk activity
Progress with new features...
Background checking
Is this cron job like or dedicated? Don't know yet.
We have some other places that have periodic activity: for example, flushing files to tape.
Checking is an activity that runs all the time, but load throttling the checking activity if the pool exceeds certain cost threshold.
Leave writing a generic-cron activity for a future date.
Thomas' GSI stuff
It seems that people have been talking about not using gsi-xroot but using SSL instead. People at CERN are complaining that gsi-xrootd is too slow.
We believe people are using gsi-xrootd.
gsi-xrootd uses expensive
The slides mention a "virtual socket". This could be their creating a second socket where they do SSL-handshake to prove identity.
Gerd to ask Thomas for a link to the sides.
The "GSI" is only used for authentication ..
PostGres? 9
Can't find any problems. Updated the machine with
Previous (non-Trunk) versions of dCache might have problems? We had old driver in the past.
Issue reported by Gerard.
Issues from yesterday's Tier-1 meeting
Issues from EMI
Satisfaction of minimum documentation
Published a web-page list describing documentation. Feedback welcome!
CMS multi-process dcap
From Dmitry
2) I had a talk with Liz Sexton, who is the leader of CMS offline software development. They have a multi-core project. An attempt to adapt CMS software to run in parallel on multi-core platform. Main motivation is that CMS software as it is is not scalable with mutli-cores as they are hitting memory limitations with for example 32 core platforms when running 32 independed CMS jobs on each of them (each process takes up to 2 GB of memory). They achieve parallelizm by forking sub-processes after loading bulk conditions. Memory usage is decreased (compared to running same number of "main" processes) due to page sharing between parent and children. As the result CMS software works as follows: a bunch of child processes open different chunks of a file for parallel processing. Works fine with local file. Fails with dcap. From Liz I understood that Brian Bockelman sent some questions regarding it to dcache. I have not seen it. Does anybody on the team know about it? They need some response. Actially meanwhile Brian has made a patch to dcap to address CMS needs, may be we need to look at this patch? They are preparing some massive test of their modified parallel processing framework. I advertised nfsv4 for them, but I want to underastand the issue with dcap more closely.
Dmitry to forward patch to
Patch has some kind of map to the socket.
"Close" close2, close socket but doesn't kill the mover.
Workshop on multicore in CMS was in May.
Only a problem if the file is opened before the fork()s.
Two problems:
- you may close a dcap process that one,
- seek problems.
This is a new project, so we can try out NFS-4.1 as an option.
We just need to wait to see what the patch is like.
Dmitry to contact Brian, ask to contact support@… and get the patch.
Outstanding RT Tickets
[This is an auto-generated item. Don't add items here directly]
RT 5999: pnfsDump fails for OSG user
Review of RB requests
DTNM
Proposed: same time, next week.