Table of Contents
dCache Tier I meeting January 17, 2012
[part of a series of meetings]
Present
dCache.org(Tanja, Tigran, Paul, Antje), IN2P3(Nicolas), Sara(Ron), Triumf(Simon), PIC(Gerard), GridKa(Doris)
Agenda
(see box on the other side)
Site reports
IN2P3
Nicolas reported that IN2P3 will be migrating their main dCache instance from 1.9.5 to 1.9.12 at the beginning of February. He is happy with the migration procedure, having successfully migrated a smaller instance previously.
The upgrade procedure is fairly straight-forward. They are buying new hardware and will deploy dCache v1.9.12 on it. Migration will involve switching head-node and migrating all the pools.
GridKa?
Doris reported that everything is fine with GridKa.
PIC
Gerard said that PIC's WLCG production dCache instance is OK.
He also mentioned a ticket (sent yesterday) about WebDAV and data being cached on the door's local disk. Tigran anticipates making a new 1.9.12 release at the beginning of next week.
Gerard asked about mounting WebDAV as local filesystem? PIC have had problems using redirection: are there clients that support this?
Gerard also asked about a Chimera ticket. Tigran has been able to reproduce the problem on his machine, so hopes to have the problem fixed soon.
Triumf
Simon reported that everything is OK at Triumf.
He mentioned a ticket about movers getting stuck. The problem appears during stress-testing Solaris machines.
He has switched the door and pool nodes to log at debug level. He saw that the pool was waiting for a timeout.
Simon got a reply yesterday. He's in the process of following up on the reply. The initial results show an improvement but that movers are still getting stuck. This needs further investigation to establish if the case is the same.
SARA
Ron reported that SARA's dCache instance is working "perfectly fine."
There is ticket 6897, though. This was originally a GGUS ticket following a failed transfer between PIC and Dublin. The FTS log shows an error message that he was able to find.
Tigran mentioned that the problem happens if the connection between door and pool or between door and client is disconnected (for some reason). In particular, if the client wants to make 10 concurrent connections and uses only 9 then the adapter will wait for the missing connection. The door will likely have one (or more) socket in LISTEN or CLOSE_WAIT state.
Ron also mentioned that, when he Googled the error message, the results showed FTS log files for various Tier-1 sites.
Tigran mentioned that we had this bug in the past, but we thought the problem was fixed.
Ron will send output from netstat running on the door and pool (if that is easy to discover) to the ticket.
Configuration
Nicolas asked a couple of questions.
srmLoginBroker
What is it? Is it needed?
Doors register themselves in a login broker. Most doors use a common login broker, but the SRM registers itself in a different one. It does this to avoid using itself to answer srmPrepareToGet requests.
Almost all dCache components ignore the srmLoginBroker. The exception is the info service. This uses both loginBroker and srmLoginBroker to build a list of available doors. In turn, this information is used by the info-provider to describe what doors are available.
If no srmLoginBroker is running then the info-provider will not publish the SRM-related GLUE information into BDII and services like FTS will no longer work.
broadcaster and space-manager
Nicolas noticed that, when a file is deleted, dCache attempts to send a message to space-manager. It does this even if space-manager is disabled.
This looks like a bug.
Nicolas will send an email to support@… with the details so we can track getting the problem fixed.
Support tickets for discussion
[Items are added here automagically]
DTNM
Same time, next week.
