wiki:developers-meeting-20090929
Last modified 11 years ago Last modified on 09/29/09 16:43:52

dCache Tier I meeting September 29, 2009

[part of a series of meetings]

Present

dCache.org(), IN2P3(), Sara(Onno), Triumf(Simon), BNL(Pedro), NDGF(Gerd), PIC(Gerard), GridKa(Doris), Fermi(), CERN()

Agenda

(see box on the other side)

Site reports

BNL

  • nothing to report

Triumf

Things are OK.

Two tickets are ongoing.

Sara

We have the same problem with stage tests. Almost every time it collapses our SRM. This happened this afternoon. We started the stage test and immediately SRM collaposed again. Stopped the stage and restart the SRM and it is fine.

The test is a reasonable load on dcache.

WHen SRM seems to hang we see

SEVERE: increase .. max threads servlet status

connect

All internal message threads are busy.

Do you have 500 or 1,000 concurrent connections

Onno has the number of dump of netstat command.

If you have 500 clients concurrently connections then this explains the problem.

If not then the question is what are those worker threads doing.

Thread dump should be in the ticket

Ticket #5112

Which client.

We use gfal prestage

Pedro: we also had this problem.

Increase the queue in front of tomcat.

We only have 500 threads;

DQ2 srm bringOn then do srm ls polling that would kill the SRM with load.

PIC

Things seem to be fine.

PoolManager high load 50%.

25,000 pin manager requests.

pool manager rcls high numbers of requests.

After restart everything is fine.

Which version: 1.9.2-7 on servers and 1.9.2-5 pools.

More recent versions of dCache drop requests that have timeout.

FZK

Everything fine: nothing to report.

CMS migration of old to the new. New one is in production.

NDGF

Everything seems OK.

We are upgrading to 1.9.5-preview tomorrow.

Questions

Owen asked what version of Java people have installed on their worker nodes.

DTNM

Proposed: same time, next week.