wiki:ReplicaManager
Last modified 7 years ago Last modified on 04/20/11 11:00:26

The ReplicaManager Component

The ReplicaManager is a dCache Component. It should not be confused with the CopyManager.

Installation

To prepare the database for Replica Manager do the following:

# su postgres
$ createdb -U srmdcache replicas
$ psql -U srmdcache -d replicas -f /opt/d-cache/etc/psql_install_replicas.sql
$ exit
# 

If for some reason you want to preserve the existing database you can upgrade schema with the procedure below:

upgrade from v0 to v1

To migrate from v0 to v1 you will need to perform a database migration. The steps are:

  • run the regular rpm upgrade procedure
  • do not start the ReplicaManager domain after running install.sh, instead
  • run the provided migration script:
    # su postgres
    $ psql -U srmdcache -d replicas -f /opt/d-cache/etc/psql_upgrade_replicas.sql
    $ exit
    # 
    
  • Follow the standard startup procedure after that

Please note that, with newer versions of dCache, the file psql_upgrade_replicas.sql is available as /opt/d-cache/share/replica/psql_upgrade_replicas.sql and not in the /opt/d-cache/etc directory.


How-Tos

How to see the current pending tasks

[dcadmin01.usatlas.bnl.gov] (local) admin > cd replicaManager
[dcadmin01.usatlas.bnl.gov] (replicaManager) admin > task ls
Id=10320;type=Reduction( 002D000000000000000152F8 dc077_5 );status=Active;runtime= 02:46:32
Id=10321;type=Reduction( 002D00000000000000042220 dc083_5 );status=Active;runtime= 02:46:30
Id=10322;type=Reduction( 002D00000000000000018200 dc062_5 );status=Active;runtime= 02:46:28
Id=10323;type=Reduction( 002D0000000000000000D990 dc091_5 );status=Active;runtime= 02:46:26

How to remove a task

[dcadmin01.usatlas.bnl.gov] (local) admin > cd replicaManager
[dcadmin01.usatlas.bnl.gov] (replicaManager) admin > task remove 10320

How to see if there unique files on a pool

A unique file is a file which only has 1 replica.

(REVIEW: this is as Pedro Salgado understands it. someone needs to confirm as to say correct the line above and remove this one)

[dcadmin01.usatlas.bnl.gov] (local) admin > cd replicaManager
[dcadmin01.usatlas.bnl.gov] (replicaManager) admin > ls unique dc074_5
Found 0

How to see where a file has been replicated to

[dcadmin01.usatlas.bnl.gov] (local) admin > cd replicaManager
[dcadmin01.usatlas.bnl.gov] (replicaManager) admin > ls pnfsid 002D000000000000000370E0
002D000000000000000370E0 dc062_5 dc064_5 dc086_5
[dcadmin01.usatlas.bnl.gov] (local) admin > cd PnfsManager
[dcadmin01.usatlas.bnl.gov] (PnfsManager) admin > cacheinfoof 002D000000000000000370E0
 dc086_5 dc062_5 dc064_5

Known deployments

BNL

At the Brookhaven National Laboratory, the ReplicaManager is used to deploy an area for files with a high access profile (ATLAS HOTDISK space token: conditions data, database release files, among others).

The setup has been done so that you will always have 3 replicas of any file. After some tests performed internally, it has been noticed that dCache creates more replicas (p2p-oncost=true) when under heavy load (which is highly desirable). After some time, the ReplicaManager will remove these extra replicas making the count to be as set initially.

Regarding the space reservation on the SrmSpaceManager, we've set the HOTDISK space token with total_space/#replicas.

By the time of this writing, the HOTDISK space token area had 6TB assigned to it (2TB on the SrmSpaceManager). These 6TB are spread over 30 200GB pools residing on different hosts.

BNL's configuration is:

  • PoolManager.conf
    psu create ugroup token-HOTDISK
    psu addto ugroup token-HOTDISK HOTDISK:ATLAS@osm
    #
    psu create pgroup HOTDISK-readwrite
    psu addto pgroup HOTDISK-readwrite dc062_5
    psu addto pgroup HOTDISK-readwrite dc063_5
    psu addto pgroup HOTDISK-readwrite dc064_5
    psu addto pgroup HOTDISK-readwrite dc065_5
    psu addto pgroup HOTDISK-readwrite dc066_5
    psu addto pgroup HOTDISK-readwrite dc067_5
    psu addto pgroup HOTDISK-readwrite dc068_5
    psu addto pgroup HOTDISK-readwrite dc069_5
    psu addto pgroup HOTDISK-readwrite dc070_5
    psu addto pgroup HOTDISK-readwrite dc071_5
    psu addto pgroup HOTDISK-readwrite dc072_5
    psu addto pgroup HOTDISK-readwrite dc073_5
    psu addto pgroup HOTDISK-readwrite dc074_5
    psu addto pgroup HOTDISK-readwrite dc075_5
    psu addto pgroup HOTDISK-readwrite dc076_5
    psu addto pgroup HOTDISK-readwrite dc077_5
    psu addto pgroup HOTDISK-readwrite dc078_5
    psu addto pgroup HOTDISK-readwrite dc079_5
    psu addto pgroup HOTDISK-readwrite dc080_5
    psu addto pgroup HOTDISK-readwrite dc081_5
    psu addto pgroup HOTDISK-readwrite dc082_5
    psu addto pgroup HOTDISK-readwrite dc083_5
    psu addto pgroup HOTDISK-readwrite dc084_5
    psu addto pgroup HOTDISK-readwrite dc085_5
    psu addto pgroup HOTDISK-readwrite dc086_5
    psu addto pgroup HOTDISK-readwrite dc087_5
    psu addto pgroup HOTDISK-readwrite dc088_5
    psu addto pgroup HOTDISK-readwrite dc089_5
    psu addto pgroup HOTDISK-readwrite dc090_5
    psu addto pgroup HOTDISK-readwrite dc091_5
    #
    psu create link lnk-HOTDISK-read any-protocol token-HOTDISK world-net
    psu set link lnk-HOTDISK-read -readpref=20 -writepref=0 -cachepref=0 -p2ppref=20 -section=costpreference-RESILIENT
    psu add link lnk-HOTDISK-read HOTDISK-readwrite
    #
    psu create link lnk-HOTDISK-write any-protocol token-HOTDISK world-net
    psu set link lnk-HOTDISK-write -readpref=0 -writepref=20 -cachepref=0 -p2ppref=0 -section=costpreference-DISKONLY
    psu add link lnk-HOTDISK-write HOTDISK-readwrite
    #
    psu create linkGroup hot-disk-link-group
    # WARNING
    # need to put custodialAllowed so that files arriving on the HOTDISK pools
    # are marked as precious (ReplicaManager only replicates precious files)
    psu set linkGroup custodialAllowed hot-disk-link-group true
    psu set linkGroup replicaAllowed hot-disk-link-group true
    psu set linkGroup nearlineAllowed hot-disk-link-group true
    psu set linkGroup outputAllowed hot-disk-link-group false
    psu set linkGroup onlineAllowed hot-disk-link-group true
    psu addto linkGroup hot-disk-link-group lnk-HOTDISK-write
    #
    pm set costpreference-DISKONLY  -cpucostfactor=1.0 -spacecostfactor=1.0
    pm set costpreference-DISKONLY  -idle=0.0 -p2p=0.8 -alert=0.0 -halt=0.0 -fallback=0.0
    pm set costpreference-DISKONLY  -p2p-allowed=no
    pm set costpreference-DISKONLY  -stage-allowed=no
    pm set costpreference-DISKONLY  -max-copies=2
    #
    pm set costpreference-RESILIENT  -cpucostfactor=1.0 -spacecostfactor=0.0
    pm set costpreference-RESILIENT  -idle=0.0 -p2p=0.1 -alert=1.0 -halt=0.0 -fallback=0.0
    pm set costpreference-RESILIENT  -p2p-allowed=yes -p2p-oncost=yes
    pm set costpreference-RESILIENT  -stage-allowed=no
    pm set costpreference-RESILIENT  -max-copies=20
    
  • replicaSetup
    #  ---- Replica Manager database settings
    #
    #   Do not change unless you know what you are doing.
    #
    #   Database name: replicas
    #
    replicaManagerDatabaseHost=###
    replicaDbName=###
    replicaDbUser=###
    replicaDbPassword=###
    replicaPasswordFile=###
    resilientGroupName=HOTDISK-readwrite
    #replicaPoolWatchDogPeriod=600
    #replicaWaitDBUpdateTimeout=600
    #replicaExcludedFilesExpirationTimeout=43200
    #replicaDelayDBStartTimeout=1200
    #replicaAdjustStartTimeout=1200
    #replicaWaitReplicateTimeout=43200
    ### WARNING
    # due to a bug on 1.9.4-3, reduce tasks get stuck until they timeout.
    # you need to reduce the timeout so that all of your active workers
    # don't get to a state where they aren't doing anything because
    # they are all stuck on 'Reduce'.
    replicaWaitReduceTimeout=10
    replicaDebug=false
    replicaMaxWorkers=100
    replicaMin=3
    replicaMax=3
    

Conclusion:

  • the setup has been proven to meet the requirements needed for the usage given by ATLAS
  • no maintenance or cronjobs are required to keep the number of replicas
  • there is a bug on 'Reduce' tasks which makes them get stuck [@1.9.4-3] so we suggest putting a lower value on the 'replicaWaitReduceTimeout' parameter
  • on space token areas you need to be careful and put 'custodialAllowed true' on the write link group or else the files will arrived as 'Cached' on the pools and no replication will occur
  • BNL uses pcache on their worker nodes therefore the advantage for database release files is reduced

Contact: iriswu @ bnl.gov, psalgado @ bnl.gov


Last Modified Fri Apr 20 22:22:52 2018