Table of Contents
[part of a series of meetings]
Participants
Antje, Jan, Paul, Tigran, Karsten, Tanja, Dmitry, Christian
Agenda
[see box on the right-hand side]
Postcards
Up to two minutes (uninterrupted) per person where they can answer two questions:
- What I did last week (since the last meeting),
- What I plan to do in the next week.
No questions until we get through everyone :)
Antje: documentation
Jan: documentation; tab-completion script for dcache script.
Paul: closing ticket, fix bugs, documentation
Tigran: chimera split/merge script
Karsten: gPlazma docu. and some gPlazma coding; WLCG poster
Tanja: HSM documentation and tickets.
Dmitry: nothing
Christian: By hand-creating RfC reports for EMI. Some documentation fixes. Trying to get desktop working again (naughty gfx card).
Q: CDF problem with restore distribution
If I have lots of pools, queue-restore balancing is off. How do I balance this out?
CDF have a bottleneck talking to the HSM. If we (dCache) queue the requests this makes the situation worse. Trickles rather than flows.
100 pools: 16 .. 12 .. 20 restores are queued on some pools while other pools have no requests.
Do you have the same amount of restore movers allowed on all pools? No. 4 groups of pools: ~20 pools in each group.
Group of pools with max 16 24 see factor 2 difference between the number of restores 40 80.
Regular movers? (Yes.)
Are they equally anisotropic? No, they're better distributed.
In pool-manager, do "cm ls" and send it as a ticket (support@…). The output will be a large table.
This will give us the list of pools with their costs. This might give us a clue.
Is this using enstore? Yes.
Plans for patch-releases
Should we make a new patch release?
Probably end-of-this week, or beginning next week 1.9.12-3.
Do we mark it green? There's still development work: the gPlazma work is being back-ported; this includes work that changes interfaces.
But the rest of the code-base is "green".
So, we mark 1.9.12-3 green.
What does "green" mean? Tigran: that we're ready to receive support tickets.
1.9.5-26 was released last week; nothing pressing.
Trunk activity
Progress with new features...
gPlazma
Tigran: add IDENTITY for map and reverseMap.
Gerd has adding support for tunnelling dCache properties into gPlazma. Karsten will add support within gPlazma for the plugins to get these properties. Patches coming soon.
Dmitry: I will work on the GUMS plugins for gPlazma2.
IPv6
Checked NFS works with IPv6; pool-door communication supports sending only one interface. This needs to be fixed.
This makes fixing 1.9.12 difficult because the inter-component communication (Messages) would need to change.
With 1.9.12 we could support pools running only IPv4 or only IPv6 without too much pain. Most protocols send a redirect from door to point client to the pool. We currently throw-away all IPv6 addresses. This must be fixed.
DESY IT will start allocate IPv6 addresses to nodes in the IT office subnet starting next week. Details to be announced in Friday's VV.
Issues from yesterday's Tier-1 meeting
Highlights (see notes for full details):
- KIT: large number of GridFTP transfers causing problems.
- PIC: running 1.9.12 on Tier-3.
Can we prevent one VO's denial-of-service attack on a GridFTP door affecting other VOs?
Through SRM, yes. Number of transfers per user can be limited. Note that this is the gPlazma-mapped user, so it can collapses a VO into a single user.
For GridFTP we we could have dedicated doors for VOs.
We could implement behaviour where too many concurrent connections from the same VO / user then their transfer is rejected.
This should be some centralised component.
gPlazma ACCOUNT step could do this; but only if we were to record log-out. But, there is currently no way to know when a transfer is finished.
Perhaps we should add gPlazma logout method.
The problem here is that dCache doesn't have the concept of "logout". Some protocols lack support for this (e.g., NFS, dcap).
With GridFTP, people transfer only a single file per session.
Another possibility is to have a queue per VO on the pools. Currently, a door is bound to a particular queue, so per-VO queues would ...
Perhaps this is overkill: just have VO-specific doors.
Issues from EMI
EMI Metrics https://twiki.cern.ch/twiki/pub/EMI/EmiSa2QAMetricsReports/DetailedReport.pdf
EMI asked me to give feedback on the metrics report they are creating.
I would find it very important to know:
- What are the implications of the metrics? --> many bugs == bad I will ask that in my feedback
- What are the expectations on improvement?
- Would you (developers) have questions, please write here.
These are the questions they are asking - What is your (developers) take on that?
- improvements:
- useless metrics:
- missing metrics:
Tigran: don't like the idea completely. Look for code duplication over all EMI code-base.
Paul: drop %-comment metric (useless). There are missing metrics from static analysis of C, Python, Perl.
Dmitry: lines of code is interesting, especially over time. Don't sure you can make strong conclusions.
Defer further discussion until Patrick is back.
Requests for change (RfCs) that are relevant for EMI
This was the way we kept trac of RfCs so far: http://trac.dcache.org/projects/dcache/ticket/290. I would like to see and we already have a queue that we used during the developers' meeting here in Hamburg last week: http://rt.dcache.org/Search/Results.html?Query=Queue%20%3D%20%27feature%20requests%27%20AND%20%28Status%20%3D%20%27new%27%20OR%20Status%20%3D%20%27open%27%20OR%20Status%20%3D%20%27stalled%27%29. Are there any objections to use RT for RfCs.
Tigran: You can wipe all the ticket in trac. They exist only because trac comes with a ticketing system. We can switch it off and keep trac as only a wiki.
So, everyone here seems happy with the switch, but we need to talk to Patrick about this.
Outstanding RT Tickets
[This is an auto-generated item. Don't add items here directly]
There should another ticket, one that Tanja marked, but which isn't here. Not sure why ..
Time!
How about starting earlier; say 16:15 (== 09:15). May be a problem for Dmitry due to unpredictable due to commitments in delivering small people to their place of learning.
How about fixing the ending time. Fix this so we start on time and end always at a fixed time, 17:30 (10:30 Dmitry-standard-time), say.
We can reduce the number of things we discuss: Design, tickets, some internal stuff. Maybe something from Tier-1 support.
For now, keep things as-is but start on time and have a fixed ending time.
Review of RB requests
DTNM
On time, next week.
