Table of Contents
dCache Tier I meeting September 29, 2009
[part of a series of meetings]
Present
dCache.org(), IN2P3(), Sara(Onno), Triumf(Simon), BNL(Pedro), NDGF(Gerd), PIC(Gerard), GridKa(Doris), Fermi(), CERN()
Agenda
(see box on the other side)
Site reports
BNL
- nothing to report
Triumf
Things are OK.
Two tickets are ongoing.
Sara
We have the same problem with stage tests. Almost every time it collapses our SRM. This happened this afternoon. We started the stage test and immediately SRM collaposed again. Stopped the stage and restart the SRM and it is fine.
The test is a reasonable load on dcache.
WHen SRM seems to hang we see
SEVERE: increase .. max threads servlet status
connect
All internal message threads are busy.
Do you have 500 or 1,000 concurrent connections
Onno has the number of dump of netstat command.
If you have 500 clients concurrently connections then this explains the problem.
If not then the question is what are those worker threads doing.
Thread dump should be in the ticket
Ticket #5112
Which client.
We use gfal prestage
Pedro: we also had this problem.
Increase the queue in front of tomcat.
We only have 500 threads;
DQ2 srm bringOn then do srm ls polling that would kill the SRM with load.
PIC
Things seem to be fine.
PoolManager high load 50%.
25,000 pin manager requests.
pool manager rcls high numbers of requests.
After restart everything is fine.
Which version: 1.9.2-7 on servers and 1.9.2-5 pools.
More recent versions of dCache drop requests that have timeout.
FZK
Everything fine: nothing to report.
CMS migration of old to the new. New one is in production.
NDGF
Everything seems OK.
We are upgrading to 1.9.5-preview tomorrow.
Questions
Owen asked what version of Java people have installed on their worker nodes.
DTNM
Proposed: same time, next week.
