wiki:tier-one-meeting-20170214
Last modified 5 months ago Last modified on 02/14/17 16:52:35

dCache Tier I meeting MONTH DATE, 2013

[part of a series of meetings]

Present

dCache.org(Paul), IN2P3(), Sara(), Triumf(), BNL(), NDGF(Ulf), PIC(Marc), KIT(), Fermi(), CERN()

Agenda

(see box on the other side)

Site reports

PIC

Everything mostly OK

ATLAS stress-test

The test also affected NDGF and KIT.

PIC started with good performance, but after one day performance dropped.

November -- increase in lifetime in Rucio. System was set according to behaviour from the past year.

Discovered this the same day; the parameter updated from 1day to 10 days; everything recovered quickly once the change was made.

Discuss with ATLAS, lifetime should be 2 days for tape.

Bring-online lifetime.

ATLAS going to do a really large recall 150 TiB from each Tier-1. In Rucio, they requested 48 hours, but dCache

The dCache's defaults file doesn't say whether this is a default or a maximum value.

NDGF has 24 hours; PIC also had 24 hours.

PIC are going to update the timeout to 100 hours.

Problem #1: Timeout information missing in dCache SRM response.

Problem #2: SRM_FAILURE "request lifetime expired" rather than SRM_REQUEST_TIMED_OUT

How does FTS react to SRM_REQUEST_TIMED_OUT? timeout information.

NDGF

ATLAS stress test

Biggest problem with ATLAS was with the LHC-one link to Slovinia was completely full. The tape files tried to p2p files over to Slovinia. Needed to take those Slovinia pools off from p2p.

Other than that, dCache worked fine for the recalls.

Problems with core domains

Move two test systems over to stock dCache 3.0. Getting the Satellite and core settings right is difficult.

Head node has multiple domains, then the setting needs to go into the domain.

Current limits to single core domain per host means config is fiddly.

KIT

dCache is running just fine: no issues to report.

ATLAS stress test

ATLAS complained about requests failing with:

[SE][StatusOfBringOnlineRequest][SRM_FAILURE] Changing file state because request state has changed.

Scripts

Paul will ask Tigran about SQL scripts RT ticket.

Tape migration

Moving LHCb data from TSM to HPSS -- least data of VOs, move other VOs once successfully completed.

Switch from TSM to HPSS. The data has to be migrated. Migrate data with dCache.

Set up two pools: one with PRECIOUS and flush to HPSS.

Permanent migration move in TSM pool sending it to HPSS. Need to go into chimera to remove the old TSM entries in t_location table.

Can also reorder requests to optimise fetching from TSM.

How to stage from a specific tape system. This is to test that files were written correctly by staging back from HPSS (and not TSM). Paul to investigate.

dCache workshop

Where do we register?

Which day do we start?

Support tickets for discussion

[Items are added here automagically]

DTNM

Same time, next week.