wiki:developers-meeting-20141113
Last modified 5 years ago Last modified on 11/13/14 16:59:58

dCache Tier I meeting MONTH DATE, 2013

[part of a series of meetings]

Present

dCache.org(Tigran, Paul), Fermi(Natalia)

Agenda

(see box on the other side)

Site reports

Fermilab

Have mounted dCache

Have seen mount-point hanging.

Switch from 4.1 to 4.0 in test.

Catalin also increased memory available (probably unrelated)

Jobs creating directories using mkdir on the filesystem.

Could be client attempting to delete a file before writing; unclear why this is happening, unclear whether this is dangerous.

Tuesday, change --> on production, changed to version 4.0

Spoke with Dmitry to help debug this problem. He recommend downgrading to v3.

No new hangs since Tuesday.

A lot of known problems with data IO with v4.1. There should be no great performance difference between v4.0 and v3 for namespace

Does it hang forever or for 10 minutes? There's some evidence that it can recover. Some cases where this is stuck for more than 24 hours.

Known problem with df taking a long time.

There is a bug in 2.2 where running the dccp command on an NFS v4.1 mounted dCache will cause that mount-point to hang. Other client nodes will be unaffected. The dccp process may have gone.

Try to keep NFS v4.0 mounts and, if you see problems, try down-grading to NFS v3.

intervention

Need to add new pools and want to migrate data.

RAID array failing, need to decommission the RAID system.

We two pools with enough space.

  1. Where do we migrate the
  1. How many concurrent transfers can we run (3 disks, in parallel)

Recommend not increasing the number of parallel transfers: don't place additional load on an unhappy system.

There are system errors from the OS.

Guide: run [(n_disks / n_pools) - 2] parallel transfers per pool, where '2' is an arbitrary number.

Advise on partitioning

At DESY, we used to buy big RAID, splitting into two partitions; both partitions are visible from two machine. Under normal conditions, each partition is served as a dCache pool running on different machines. If a machine fails, then the unaffected machine can serve both pools.

PostgreSQL version

PostgreSQL version: DESY running 9.3 for about 1 year.

They provide a migration script for moving from 9.x to 9.3. At DESY, we always do a dump-restore. This way we get a final backup of the old system.

Support tickets for discussion

[Items are added here automagically]

DTNM

Proposed: same time, next week.