Table of Contents
[part of a series of meetings]
Participants
Gerd, Patrick, Christian, Paul, Antje, Tigran, Tanja
Agenda
[see box on the right-hand side]
Postcards
Up to two minutes (uninterrupted) per person where they can answer two questions:
- What I did last week (since the last meeting),
- What I plan to do in the next week.
No questions until we get through everyone :)
Tigran: came back from 2-week holidays. Last week we did 5 releases of 3 different branches (!!). Help sys-admins here to upgrade their systems. This week mostly getting Karsten gPlazma2 vorole to work.
Antje: Working on the book. gPlazma gsiDcap, SRM .. explaining how things are supposed to work.
Paul: Taipei last week.
Christian: worked on EMI testbed machines. Putting current release candidate. Skimming down to three machines. Working on EMI release process. Some documents that still needed to be produced. Also working on a new gLite release (from 1.9.5 branch) .. this is currently building, so it shouldn't take too much longer to get it out the door.
Patrick: Goettingen .. doing the financial stuff (extremely interesting). The meeting at Taipei. The security issues (this keeping you busy) 95 emails (and counting...). They may come out with the announcement today.
Gerd: Goettingen .. tickets and support .. helping IN2P3 with their PostgreSQL. Started working on experimental login strategy thingie for gPlazama2. Waiting for deployment of PostgreSQL 9. Small utility to do "du" via Chimera database. Try to show the output via protoviz to give some nice graphs .. needs some pre-processing as tool cannot handle too many directories.
DOT has a hacky script to generate histogram of file sizes to allow DDN storage tuning.
Tanja: Goettingen; tickets; reading NFS spec about read delegation. Looking how HSM works with dCache with a view to writing updated documentation on this.
Plans for patch-releases
Should we make a new patch release?
No plans currently.
Some stuff already merged that hasn't been released yet; but releasing 1.9.12 is the priority.
1.9.12 status
There are no pending (two actually, but ..) two actually.
Looks OK. There are some S2 tests that are failing because we're using synchronised
We can compile S2 ourselves.
For one test, it copes with synchronised reply but forgot to extract the transfer URL from the reply.
Can you change hudson so it also logs the more verbose file?
Running with old gPlazma; running with new gPlazma.
We need another plugin for gridmap file. It should be trivial to write.
Our doors are still different: SRM was failing (not extracting FQANs from X509Chain). SRM sends DN+Chain, gsidcap sends only Chain.
Same plugin configured multiple times in gplazma.conf file then there are multiple instances of that plugin.
Split the vorole plugin into two parts:
gridvorolemap storageauth
User-password.
Assuming we have a user+pw in gPlazma2, would WebDAV support this?
Two issues: for BASIC authentication .. it's OK. If we wanted to do DIGEST auth. then we would need a special plugin for this.
A large user community: how do we give them the passwords?
DIGEST is more than a simple hash. It's a hash of three pieces of information, including the time. This prevents replay-attacks.
Would need a file-format and plugin targeted at DIGEST?
Using DIGEST is only needed if not using encrypted transport.
Karsten will work on gPlazma when back.
ACLs for NFS
ACLs for NFS are missing.
Didn't have the map calls. Still don't have Kerberos.
Swegrid are asking when NFS will be ready for deployment in production.
rpcgss-sec is there. Need callout from NFS door to convert principal to uid+gid.
1.9.12-x will have support for NFS with setacls in about two months time.
Regressions
Storing SRM ls requests in database. Performance problem.
Problem with the info-provider. Paul to work on this asap.
Should be deploying a 1.9.12 at DESY.
Logging in the pool.
Downloads page
Have a single link on the downloads page (per major release) for the release-notes. Add a table-of-contents to the release notes allowing people to jump to the patch-level release they're interested in.
ReviewBoard
Tigran wants to upgrade to a new version but not sure when this will happen.
Trunk activity
Progress with new features...
Gerd's interesting thing: put most of the retry logic into doors. Any fast decisions are done within the door. If that fails then the door has to recover from the problem. This should make the pool-manager more light-weight.
Tigran: we need some way of maintaining state between attempts. But we need to build in protection against door sending the same request twice. Include a "context object" in the message. Gerd to submit more patches to RB along these lines.
xroot client supports download from multiple sources. This is a client feature: they have an "extreme download" mode, where
Semantics of the protocol in xrootd don't match what dCache is doing.
Want to have a static protocol engine instance in the pools: both listening to a door but listen for a client. This would allow a client to talk directly to the pool.
Can't we return doors instead of pools for location queries? No: the handshake is different between redirectors (doors) and io (pools).
Namespace entry shouldn't be created by the door, but the pool. This is a problem if the client disappears between the redirect. An empty file will be left. Does this cause a problem if the client retries? Depends on whether the client specifies flag to override any existing data.
Does this cause a problem: ALICE has a work-around, but we (NDGF) does see some lingering empty namespace entries.
FTP supports multiple locations. Even without this, we could have a client that would open multiple copies.
Issues from yesterday's Tier-1 meeting
Everyone happy.
PostgreSQL
What can we do to educate people?
Tier-1 sites (at least, if not others)
Need to have on-site expertises as a Tier-1 (and major sites)
How do we communicate this?
Email to user forum.
Best practise document for PostgreSQL (+ dCache). Something like this already exists somewhere.
Have deployed PostgreSQL 9 + stand-by replication, seems to be working fine.
Issues from EMI
The only thing that we're still lack for releasing EMI-1 is the test-report. We do some testing and then write that we've done some testing.
Patrick: ask CERN who is supporting S2?
Perhaps OGF could take custodial role with S2 tests.
Security
Do we have some kind of procedure for what happens?
Very happy that people reported the problem to security@… not support@….
Our interaction with the security group(s) was OK.
AP/ Patrick will write a big page that describes the procedure. Circulate a link to this page for people to review the contents.
AP/ Christian to write similar page describing how to release.
Perhaps everyone needs to go through the EMI and gLite release process to verify the documentation is sufficient.
We would like to have a reliable list of sys-admin contacts (real people) we can inform before the security alert goes public.
dCache dev. workshop
There is an indico webpage now.
What topics should we discuss? Answers next week!
The format for the workshop could be:
Morning: theoretical plans / discussions .. hammering out new ideas. Afternoon: fixing tightly focused problems.
Tigran: would like to see a goal/vision .. but something that could be implemented.
Outstanding RT Tickets
[This is an auto-generated item. Don't add items here directly]
RT 6197: Feature Request: move dCache files from disk to tape and viceversa
Want to be able to change the AL/RP on the pool. We also want to update the namespace.
We want a way that the sys-admin can clean-up and recover from a misconfiguration.
If you change the values in the namespace and migrate the file then the new pool will
The problem is how to update the namespace. Probably want to update a whole directory (or subset of the directory). We need to support PNFS, too.
If we do this per-file then this is doable for PNFS and Chimera.
Can we do this with SRM? No: we would need a much more powerful
There are a number of other things in pnfs-manager to heklp people recover from errors. For example, we can add a command to unregister all known file locations for files on some pool that has gone down.
Review of RB requests
Gerd to add his experimental gPlazma2 strategy into dCache code. We can then migration functionality into plugins over time.
DTNM
Same time, next week.
(hi Gerard ;-)
