Present, Paul), PIC(Marc), KIT(Xavier), NDGF(Gerd)


Site reports


Xavier reported that, for the last two days, KIT ATLAS instance had suffered from two incidences on the SRM nodes. The problems showed high CPU load, where the usage hit 100%. This resulted in any attempt to use the SRM failing.

Xavier's investigation suggests that this was partly ATLAS's fault and partly dCache. ATLAS is trying to stage a very large number of small files that are archived on tape. On one pool some 20,000 files were being staged, all about 7 MiB in size. This caused a lot of load on the pool; this was despite the tape library providing near zero data-rates.

Xavier has updated ticket #8130 with this information.

Gerd asked Xavier which dCache version they are running: v2.6.28. He suggested upgrading the node hosting pool-manager to dCache v2.6.32 or newer. This should help reduce the high load on the pool.

The high load on the SRM is probably because of SRM contacting pin-manager for each of the files, and pin-manager failing to contact the pool due to the pool's high load. Above some threshold, this could lead to the SRM spending all its time attempting to pin files that should reside on the overloaded pool.

If so, upgrading pool-manager to v2.6.32 or newer should result in indirectly fixing this problem.

Ticket 8442

When attempting to parse the billing output file, Xavier discovered there were entries that shouldn't be in the billing output. The ticket shows two examples of these messages.


Marc reported that PIC has no problems at the moment.


Gerd is currently trying to debug some SRM problems at the moment. He had just learnt about them, so there are no details at the moment, and (unfortunately) the log files are not helpful.

NDGF are currently running dCache v2.10.3. Their experiences are mostly good, so far.

Support tickets for discussion

There is no meeting on Thursday. The next meeting is Tuesday 16th September.