The ReplicaManager Component
The ReplicaManager is a dCache Component. It should not be confused with the CopyManager.
Installation
To prepare the database for Replica Manager do the following:
# su postgres $ createdb -U srmdcache replicas $ psql -U srmdcache -d replicas -f /opt/d-cache/etc/psql_install_replicas.sql $ exit #
If for some reason you want to preserve the existing database you can upgrade schema with the procedure below:
upgrade from v0 to v1
To migrate from v0 to v1 you will need to perform a database migration. The steps are:
- run the regular rpm upgrade procedure
- do not start the ReplicaManager domain after running install.sh, instead
- run the provided migration script:
# su postgres $ psql -U srmdcache -d replicas -f /opt/d-cache/etc/psql_upgrade_replicas.sql $ exit #
- Follow the standard startup procedure after that
Please note that, with newer versions of dCache, the file psql_upgrade_replicas.sql is available as /opt/d-cache/share/replica/psql_upgrade_replicas.sql and not in the /opt/d-cache/etc directory.
How-Tos
How to see the current pending tasks
[dcadmin01.usatlas.bnl.gov] (local) admin > cd replicaManager [dcadmin01.usatlas.bnl.gov] (replicaManager) admin > task ls Id=10320;type=Reduction( 002D000000000000000152F8 dc077_5 );status=Active;runtime= 02:46:32 Id=10321;type=Reduction( 002D00000000000000042220 dc083_5 );status=Active;runtime= 02:46:30 Id=10322;type=Reduction( 002D00000000000000018200 dc062_5 );status=Active;runtime= 02:46:28 Id=10323;type=Reduction( 002D0000000000000000D990 dc091_5 );status=Active;runtime= 02:46:26
How to remove a task
[dcadmin01.usatlas.bnl.gov] (local) admin > cd replicaManager [dcadmin01.usatlas.bnl.gov] (replicaManager) admin > task remove 10320
How to see if there unique files on a pool
A unique file is a file which only has 1 replica.
(REVIEW: this is as Pedro Salgado understands it. someone needs to confirm as to say correct the line above and remove this one)
[dcadmin01.usatlas.bnl.gov] (local) admin > cd replicaManager [dcadmin01.usatlas.bnl.gov] (replicaManager) admin > ls unique dc074_5 Found 0
How to see where a file has been replicated to
[dcadmin01.usatlas.bnl.gov] (local) admin > cd replicaManager [dcadmin01.usatlas.bnl.gov] (replicaManager) admin > ls pnfsid 002D000000000000000370E0 002D000000000000000370E0 dc062_5 dc064_5 dc086_5
[dcadmin01.usatlas.bnl.gov] (local) admin > cd PnfsManager [dcadmin01.usatlas.bnl.gov] (PnfsManager) admin > cacheinfoof 002D000000000000000370E0 dc086_5 dc062_5 dc064_5
Known deployments
BNL
At the Brookhaven National Laboratory, the ReplicaManager is used to deploy an area for files with a high access profile (ATLAS HOTDISK space token: conditions data, database release files, among others).
The setup has been done so that you will always have 3 replicas of any file. After some tests performed internally, it has been noticed that dCache creates more replicas (p2p-oncost=true) when under heavy load (which is highly desirable). After some time, the ReplicaManager will remove these extra replicas making the count to be as set initially.
Regarding the space reservation on the SrmSpaceManager, we've set the HOTDISK space token with total_space/#replicas.
By the time of this writing, the HOTDISK space token area had 6TB assigned to it (2TB on the SrmSpaceManager). These 6TB are spread over 30 200GB pools residing on different hosts.
BNL's configuration is:
- PoolManager.conf
psu create ugroup token-HOTDISK psu addto ugroup token-HOTDISK HOTDISK:ATLAS@osm # psu create pgroup HOTDISK-readwrite psu addto pgroup HOTDISK-readwrite dc062_5 psu addto pgroup HOTDISK-readwrite dc063_5 psu addto pgroup HOTDISK-readwrite dc064_5 psu addto pgroup HOTDISK-readwrite dc065_5 psu addto pgroup HOTDISK-readwrite dc066_5 psu addto pgroup HOTDISK-readwrite dc067_5 psu addto pgroup HOTDISK-readwrite dc068_5 psu addto pgroup HOTDISK-readwrite dc069_5 psu addto pgroup HOTDISK-readwrite dc070_5 psu addto pgroup HOTDISK-readwrite dc071_5 psu addto pgroup HOTDISK-readwrite dc072_5 psu addto pgroup HOTDISK-readwrite dc073_5 psu addto pgroup HOTDISK-readwrite dc074_5 psu addto pgroup HOTDISK-readwrite dc075_5 psu addto pgroup HOTDISK-readwrite dc076_5 psu addto pgroup HOTDISK-readwrite dc077_5 psu addto pgroup HOTDISK-readwrite dc078_5 psu addto pgroup HOTDISK-readwrite dc079_5 psu addto pgroup HOTDISK-readwrite dc080_5 psu addto pgroup HOTDISK-readwrite dc081_5 psu addto pgroup HOTDISK-readwrite dc082_5 psu addto pgroup HOTDISK-readwrite dc083_5 psu addto pgroup HOTDISK-readwrite dc084_5 psu addto pgroup HOTDISK-readwrite dc085_5 psu addto pgroup HOTDISK-readwrite dc086_5 psu addto pgroup HOTDISK-readwrite dc087_5 psu addto pgroup HOTDISK-readwrite dc088_5 psu addto pgroup HOTDISK-readwrite dc089_5 psu addto pgroup HOTDISK-readwrite dc090_5 psu addto pgroup HOTDISK-readwrite dc091_5 # psu create link lnk-HOTDISK-read any-protocol token-HOTDISK world-net psu set link lnk-HOTDISK-read -readpref=20 -writepref=0 -cachepref=0 -p2ppref=20 -section=costpreference-RESILIENT psu add link lnk-HOTDISK-read HOTDISK-readwrite # psu create link lnk-HOTDISK-write any-protocol token-HOTDISK world-net psu set link lnk-HOTDISK-write -readpref=0 -writepref=20 -cachepref=0 -p2ppref=0 -section=costpreference-DISKONLY psu add link lnk-HOTDISK-write HOTDISK-readwrite # psu create linkGroup hot-disk-link-group # WARNING # need to put custodialAllowed so that files arriving on the HOTDISK pools # are marked as precious (ReplicaManager only replicates precious files) psu set linkGroup custodialAllowed hot-disk-link-group true psu set linkGroup replicaAllowed hot-disk-link-group true psu set linkGroup nearlineAllowed hot-disk-link-group true psu set linkGroup outputAllowed hot-disk-link-group false psu set linkGroup onlineAllowed hot-disk-link-group true psu addto linkGroup hot-disk-link-group lnk-HOTDISK-write # pm set costpreference-DISKONLY -cpucostfactor=1.0 -spacecostfactor=1.0 pm set costpreference-DISKONLY -idle=0.0 -p2p=0.8 -alert=0.0 -halt=0.0 -fallback=0.0 pm set costpreference-DISKONLY -p2p-allowed=no pm set costpreference-DISKONLY -stage-allowed=no pm set costpreference-DISKONLY -max-copies=2 # pm set costpreference-RESILIENT -cpucostfactor=1.0 -spacecostfactor=0.0 pm set costpreference-RESILIENT -idle=0.0 -p2p=0.1 -alert=1.0 -halt=0.0 -fallback=0.0 pm set costpreference-RESILIENT -p2p-allowed=yes -p2p-oncost=yes pm set costpreference-RESILIENT -stage-allowed=no pm set costpreference-RESILIENT -max-copies=20
- replicaSetup
# ---- Replica Manager database settings # # Do not change unless you know what you are doing. # # Database name: replicas # replicaManagerDatabaseHost=### replicaDbName=### replicaDbUser=### replicaDbPassword=### replicaPasswordFile=### resilientGroupName=HOTDISK-readwrite #replicaPoolWatchDogPeriod=600 #replicaWaitDBUpdateTimeout=600 #replicaExcludedFilesExpirationTimeout=43200 #replicaDelayDBStartTimeout=1200 #replicaAdjustStartTimeout=1200 #replicaWaitReplicateTimeout=43200 ### WARNING # due to a bug on 1.9.4-3, reduce tasks get stuck until they timeout. # you need to reduce the timeout so that all of your active workers # don't get to a state where they aren't doing anything because # they are all stuck on 'Reduce'. replicaWaitReduceTimeout=10 replicaDebug=false replicaMaxWorkers=100 replicaMin=3 replicaMax=3
Conclusion:
- the setup has been proven to meet the requirements needed for the usage given by ATLAS
- no maintenance or cronjobs are required to keep the number of replicas
- there is a bug on 'Reduce' tasks which makes them get stuck [@1.9.4-3] so we suggest putting a lower value on the 'replicaWaitReduceTimeout' parameter
- on space token areas you need to be careful and put 'custodialAllowed true' on the write link group or else the files will arrived as 'Cached' on the pools and no replication will occur
- BNL uses pcache on their worker nodes therefore the advantage for database release files is reduced
Contact: iriswu @ bnl.gov, psalgado @ bnl.gov
Last Modified Thu Jul 7 15:19:00 2022