wiki:developers-meeting-20101215
Last modified 10 years ago Last modified on 12/15/10 17:45:11

[part of a series of meetings]

Participants

Gerd, Per, Dmitry, Gene; Christian, Paul, Tigran, Karsten, Tanja

Agenda

[see box on the right-hand side]

Postcards

Up to two minutes (uninterrupted) per person where they can answer two questions:

  • What I did last week (since the last meeting),
  • What I plan to do in the next week.

No questions until we get through everyone :)

Tanja: NFS work, tickets, gsidcap & gPlazma. Karsten: vo-role mapping plugin for gPlazma2. Tigran: merges, patch reviews; file-channel decorator that publishes disk performance with JMX. Paul: P Christian: separating SSH2 and admin patches; minimum EMI documentation, EMI meetings, learning more from Owen; EMI survey on testbeds. Dmitry: committed HSM-enstore support patch; merge request for 1.9.10..1.9.5. Ask Catalin for trigger in table get by Id. Testing scalable SRM with PNFS to see if PNFS will be a bottleneck; talked to Lies about CMS multicore project. Mail from Jon that they've applied Brian's patch. Gene: ... Gerd: continued working on pin-manager; design is now cleaner. Deployed Thomas' new xrootd stuff (not really GSI). Performance was disappointing, Thomas fixed this. Per: Started working on automatic background checking of file integrity.

Plans for patch-releases

Should we make a new patch release?

Promise to release things. Didn't happen (busy merging) but will do soon.

We need to make a 1.9.11 branch: tomorrow!

Two meetings ago, asking about minimum version.

We found that 1.9.10 didn't work with 1.6.0_16; upgrading to 1.6.0_22 fixed the problem.

At some point you get out-of-memory; GridFTP doors socket adapter in E-mode.

Trunk activity

Progress with new features...

Background checking

Is this cron job like or dedicated? Don't know yet.

We have some other places that have periodic activity: for example, flushing files to tape.

Checking is an activity that runs all the time, but load throttling the checking activity if the pool exceeds certain cost threshold.

Leave writing a generic-cron activity for a future date.

Thomas' GSI stuff

It seems that people have been talking about not using gsi-xroot but using SSL instead. People at CERN are complaining that gsi-xrootd is too slow.

We believe people are using gsi-xrootd.

gsi-xrootd uses expensive

The slides mention a "virtual socket". This could be their creating a second socket where they do SSL-handshake to prove identity.

Gerd to ask Thomas for a link to the sides.

The "GSI" is only used for authentication ..

PostGres? 9

Can't find any problems. Updated the machine with

Previous (non-Trunk) versions of dCache might have problems? We had old driver in the past.

Issue reported by Gerard.

Issues from yesterday's Tier-1 meeting

Issues from EMI

Satisfaction of minimum documentation

Published a web-page list describing documentation. Feedback welcome!

CMS multi-process dcap

From Dmitry

2) I had a talk with Liz Sexton, who is the leader of CMS offline 
   software development. They have a multi-core project. An attempt to 
   adapt CMS software to run in parallel on multi-core platform. Main 
   motivation is that CMS software as it is is not scalable with mutli-cores as 
   they are hitting memory limitations with for example 32 core platforms 
   when running 32 independed CMS jobs on each of them (each process takes 
   up to 2 GB of memory). They achieve parallelizm by forking sub-processes 
   after loading bulk conditions. Memory usage is decreased (compared 
   to running same number of "main" processes) due to page sharing 
   between parent and children. As the result CMS software works as 
   follows: a bunch of child processes open different chunks of a file 
   for parallel processing. 

   Works fine with local file. Fails with dcap. From Liz I understood 
   that Brian Bockelman sent some questions regarding it to dcache. I 
   have not seen it. Does anybody on the team know about it? They 
   need some response. Actially meanwhile Brian has made a patch 
   to dcap to address CMS needs, may be we need to look at this patch?

   They are preparing some massive test of their modified parallel 
   processing framework. I advertised nfsv4 for them, but I want to 
   underastand the issue with dcap more closely.

Dmitry to forward patch to

Patch has some kind of map to the socket.

"Close" close2, close socket but doesn't kill the mover.

Workshop on multicore in CMS was in May.

Only a problem if the file is opened before the fork()s.

Two problems:

  • you may close a dcap process that one,
  • seek problems.

This is a new project, so we can try out NFS-4.1 as an option.

We just need to wait to see what the patch is like.

Dmitry to contact Brian, ask to contact support@… and get the patch.

Outstanding RT Tickets

[This is an auto-generated item. Don't add items here directly]

RT 5999: pnfsDump fails for OSG user

Review of RB requests

DTNM

Proposed: same time, next week.