Table of Contents
dCache Tier I meeting October 25, 2011
[part of a series of meetings]
Present
dCache.org(Tigran, Paul, Tanja, Antje), IN2P3(Nicolas), PIC(Gerard), GridKa(Doris)
Agenda
(see box on the other side)
Site reports
PIC
Gerard reported that their production dCache instances are working fine.
LHCb movers
They have one issue: there are LHCb movers that refuse to die. These are all movers that starts queued. Their temporary work-around is to increase the maximum number of movers to 1,000; however, doing this upsets the pool statistics, so is undesirable.
There is a support ticket open about this issue RT #6691.
PIC are also looking into establishing different partitions using Partition Manager.
Tigran asked if it is always the same protocol? Gerard believes they are all GridFTP movers, but will check this.
Restarting the pool solves the problem.
During the meeting, Gerard checked the current status. At that point, the pools are OK. Gerard believes the problem may be the result of something odd that LHCb are doing; for example, asking for a lot of files at the same time.
PIC is running dCache v1.9.12-10 at the moment.
Another issue
Gerard remember that there was another issue that has now been fixed but that fix isn't available yet. He asked if there are plans to make another release of the 1.9.12 series?
Tigran suggested there will likely be another release next week.
IN2P3
Nicolas reported that things are OK just now.
Out of memory
They had a problem over the week-end that hasn't been fully understood yet. The problem was that the head node ran out of memory.
The log files are peculiar: they have lines that show the time as 10 AM, followed by lines showing the time as 8 AM, followed by lines showing 10 AM again; so they are unsure what has happened.
The immediate response to the problem was to increase the amount of memory on the pnfs-server node. This is because it was showing considerable time spent in IO-WAIT. The upgrade was from 16 GiB to 32 GiB.
Tigran asked how big is your PostgreSQL database? It's about 50 GiB.
New queue?
Nicolas mentioned that they've seen a new queue, called 'regular' when viewing the system via pserver. What is this?
Tigran explained that this was the new default queue. It is used if the door doesn't specify a queue name or if the named queue doesn't exist. This is in contrast to previous (1.9.5) behaviour where the first defined queue becomes the default.
Service names
Nicolas asked about the names of the services, as used in the layouts file; noting that all service names are in lower case, which is different to the name of some cells.
Paul explained that this was deliberate: the policy is that all service names are lower-case, so they are earlier to remember.
Deleting files and directories
Nicolas mentioned that they have some empty directories that they can't delete because dCache says the directory isn't empty.
Tigran mentioned that there is a fix for this coming soon.
Nicolas also mentioned that some files have special characters; since these characters make it difficult to delete the file from the shell, is there a way of deleting them via some direct database operation?
Paul mentioned trying "rm -i *" on the directory. This will prompt you for which files you want to delete; which removes the need to be able to type the characters.
GridKa
Doris reported a problem last week with their oldest dCache instance. The utility domain needed to be restarted.
She also reported that they have upgraded their dgrid instance to 1.9.12. This is their least-used dCache instance. So far, it seems to be working fine.
GridKa is planning on upgrading their ATLAS instance to 1.9.12 sometime between 14th and 16th November. The planning seems in hand.
Support tickets for discussion
[Items are added here automagically]
RT 6717: Some questions about LHCb space token migration at dCache sites
Dmitry is preparing some advise for sites supporting the LHCb VO on how to migrate their configuration towards LHCb's new configuration.
In order to do so, he would like a Tier-1 site to provide a gzip-ed dump of their Space Manager database.
Nicolas, Gerard and Doris volunteered to provide this information; they just require instructions on how to obtain this information.
DTNM
Same time, next week.
