wiki:TaipeiWorkshop2013

Version 82 (modified by bernardt, 8 years ago) (diff)

--

| dCache Hands-on Taipei |

Prerequisites

We would like to ask every participant to bring a notebook that has some kind of ssh client installed. Please check on your machine.

You need to login to your virtual machines (VMs). You have two VMs, one server and one client:

  • dcache-ws-server<your number>.grid.sinica.edu.tw
  • dcache-ws-client<your number>.grid.sinica.edu.tw

You will have to use two consoles, one for the server and one for the client. Use your ssh-client to connect to the machines using the following commands.

Client:

ssh dcacheuser@dcache-ws-client<your number>.grid.sinica.edu.tw

Server:

ssh root@dcache-ws-server<your number>.grid.sinica.edu.tw

After this step you should have to consoles with ssh connections to server and client. Make sure you know which console is the client, which is the server.

Content of the hands-on

In this hands-on we will cover the following topics

  • Installation of dCache server
  • Interaction with dCache
  • Certificates and dCache
  • Hardware Lifecycle (adding storage, migrating)
  • Interfacing with Tertiary Storage

Installation of dCache server

(SERVER) (SERVER) (SERVER) (SERVER)

In this part we install dCache as a single node instance. Which means that all our services will run on the same host, it shows us the basics and necessary setup. At the end of this part of hands on we will have a running dCache that can be mounted locally as well as on your client.

  1. Find the dCache server rpm on your server in the root's home direcrory /root/:
[root]# ls dcache-*
dcache-2.5.0-1.noarch.rpm
  1. Install the rpm on your server machine:
[root]# rpm -ivh dcache-2.5.0-1.noarch.rpm
Preparing...                ########################################### [100%]

1:dcache                 ########################################### [100%]
...

dCache server has three dependencies: java-1.7.0-openjdk, postgresql-server (version >8.4) and nfs-utils. You will find that openjdk and postgresql-server are already installed on your machine and postgresql setup was done for you as well. Run the following command to check for the java and postgresql packages.

[root]# rpm -qa |grep -E "postgresql92-server|java-1.7.0-openjdk|nfs-utils"
java-1.7.0-openjdk-1.7.0.9-2.3.7.1.el6_3.x86_64
nfs-utils-1.2.3-36.el6.x86_64
nfs-utils-lib-1.1.5-6.el6.x86_64
postgresql92-server-9.2.3-2PGDG.rhel6.x86_64

Look this up later: For more information on this please see dCache Book - Prerequisites. There are also several trivial steps that you have to do to setup postgresql, but they are not part of this tutorial and can be found in dCache Book - Readying the PostgreSQL server for the use with dCache We don't focus on detailed configuration of postgres server here, but have a look at the provided script for the commands needed if you are interested (configurepgsql.sh). Please execute the script as follows:

[root]# cat configurepgsql.sh
#/bin/bash

service postgresql-9.2 initdb
service postgresql-9.2 start
chkconfig postgresql-9.2 on

mv /var/lib/pgsql/9.2/data/pg_hba.conf{,_ori}
cat >/var/lib/pgsql/9.2/data/pg_hba.conf <<EOF
# TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD

# "local" is for Unix domain socket connections only
local   all         all                               trust
# IPv4 local connections:
host    all         all         127.0.0.1/32          trust
# IPv6 local connections:
host    all         all         ::1/128               trust
EOF

sed -ie 's/max_connections = 100/max_connections = 1000/' /var/lib/pgsql/9.2/data/postgresql.conf

service postgresql-9.2 restart
createdb -U postgres chimera
createuser -U postgres --no-superuser --no-createrole --createdb chimera
psql -U chimera chimera -f /usr/share/dcache/chimera/sql/create.sql
createlang -U postgres plpgsql chimera
psql -U chimera chimera -f /usr/share/dcache/chimera/sql/pgsql-procedures.sql
createuser -U postgres --no-superuser --no-createrole --createdb srmdcache
createdb -U srmdcache dcache

Now execute the configurepgsql.sh script:

[root]# ./configurepgsql.sh
Initializing database:                                     [  OK  ]
Starting postgresql-9.2 service:                           [  OK  ]
Stopping postgresql-9.2 service:                           [  OK  ]
Starting postgresql-9.2 service:                           [  OK  ]
psql:/usr/share/dcache/chimera/sql/create.sql:23: NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "t_inodes_pkey" for table "t_inodes"
CREATE TABLE
......

lot of lines 

......

CREATE TRIGGER

Now you have a running and fully configured postgresql server and installed dCache server.

Configuration files

We make use of flat files, to define and change the settings for the layout and the behavior of dCache on different levels (per host, per domain or per service). There are three main places for the configuration files:

  • /usr/share/dcache/defaults

This directory is filled with files defining the default settings for all dCache services, as they are shipped by dCache.org. Do not modify these files, as they will be replaced by subsequent updates!

  • /etc/dcache/dcache.conf

The central configuration file, that ideally should be nearly identical on all nodes of the dCache setup. Maybe it is possible to have only one difference among all nodes at all: the parameter pointing to the layout configuration files. To get to know what settings can be made in dcache.conf you can look through dcache.properties in the defaults directory.

  • /etc/dcache/layouts

Layout files are the place to define the actual topology for the dCache services/domains on this node. So typically no layout file of one node is identical to any other layout file of another node in the setup.

dCache.org provides us with premade layout files that state a possible distribution of services over domains: head.conf, pool.conf and single.conf. Right now you could start dCache and it would use the empty dcache.conf file and the fallback layout file single.conf. With this the most important core services will be configured to run in one single domain with default settings. Alternatively, head.conf has predefined the mandatory services in a decent number of domains to be run on the headnode of your setup. Of course, you will need at least one other node using pool.conf to provide some disk space to dCache.

We would like to have our own layout file that we just use for this hands-on, therefor we create it from a template:

[root]# cp /etc/dcache/layouts/{single,isgc2013_ws}.conf 

We need to tell dCache to use the layout file we just created and add this to /etc/dcache/dcache.conf. Do not make the mistake of entering the entire file name there, just add the following line:

dcache.layout=isgc2013_ws

Adjust the layout file

Firstly we need to tell dCache that there will be many domains communicating with each other:

[root]# vi /etc/dcache/layouts/isgc2013_ws.conf

As we described in the introduction dCache services run inside so called domains. Since we will have many domains, we will need some mechanism for these domains to communicate. The cells framework is used for this communication, which is why we activate it by changing the following line in /etc/dcache/layouts/isgc2013_ws.conf:

broker.scheme=none

to

broker.scheme=cells

As mentioned we want to be able to mount dCache locally using NFSv41. Therefor a service called nfsv41 needs to be started, which we will keep in a separate domain for convenience of restarting it separately from the rest of dCache. These lines need to be added at the end of /etc/dcache/layouts/isgc2013_ws.conf. They add a domain to dCache - [nfs-Domain] - that holds the nfsv41 service.

[nfs-Domain]
[nfs-Domain/nfsv41]

Then leave the file after saving it, make sure you are not in INSERT mode by pressing ESC. Then type:

:wq

(Just for the ones not familiar with vi.)

Before we can start dCache we have to empty /etc/dcache/gplazma.conf as there is no security configured in a blank dCache:

[root]# echo "" > /etc/dcache/gplazma.conf

Check that all is done properly before starting dcache:

[root]# dcache check-config
No problems found.

Now start dCache by:

[root]# dcache start
Starting dCacheDomain done
Starting nfs-Domain done

Check if dCache has started up correctly for domains:

[root]# dcache status
DOMAIN       STATUS  PID   USER
dCacheDomain running 18314 dcache
nfs-Domain   running 18359 dcache

and services:

[root]# dcache services
DOMAIN       SERVICE         CELL            LOG
dCacheDomain admin           alm             /var/log/dcache/dCacheDomain.log
dCacheDomain broadcast       broadcast       /var/log/dcache/dCacheDomain.log
dCacheDomain poolmanager     PoolManager     /var/log/dcache/dCacheDomain.log
dCacheDomain loginbroker     LoginBroker     /var/log/dcache/dCacheDomain.log
dCacheDomain spacemanager    SrmSpaceManager /var/log/dcache/dCacheDomain.log
dCacheDomain pnfsmanager     PnfsManager     /var/log/dcache/dCacheDomain.log
dCacheDomain cleaner         cleaner         /var/log/dcache/dCacheDomain.log
dCacheDomain dir             dirLookupPool   /var/log/dcache/dCacheDomain.log
dCacheDomain gplazma         gPlazma         /var/log/dcache/dCacheDomain.log
dCacheDomain pinmanager      PinManager      /var/log/dcache/dCacheDomain.log
dCacheDomain billing         billing         /var/log/dcache/dCacheDomain.log
dCacheDomain srm-loginbroker srm-LoginBroker /var/log/dcache/dCacheDomain.log
dCacheDomain httpd           httpd           /var/log/dcache/dCacheDomain.log
dCacheDomain topo            topo            /var/log/dcache/dCacheDomain.log
dCacheDomain info            info            /var/log/dcache/dCacheDomain.log
nfs-Domain   nfsv41          NFSv41-vt-021   /var/log/dcache/nfs-Domain.log

Then also check the log files:

[root@vt-021 data]# tail -F /var/log/dcache/*
==> /var/log/dcache/dCacheDomain.log <==
01 Mar 2013 18:22:01 (gPlazma) [] NodeList has 1 entries
01 Mar 2013 18:22:01 (gPlazma) [] examining plugin with class class org.dcache.gplazma.plugins.JaasPlugin
01 Mar 2013 18:22:01 (gPlazma) [] Adding plugin [jaas, org.dcache.gplazma.plugins.JaasPlugin]
01 Mar 2013 18:22:01 (gPlazma) [] Created 1 plugin metadata entries
01 Mar 2013 18:22:03 (PinManager) [] [AspectJ] javax.* types are not being woven because the weaver option '-Xset:weaveJavaxPackages=true' has not been specified
INFO 3/1/13 6:22 PM:liquibase: Successfully acquired change log lock
INFO 3/1/13 6:22 PM:liquibase: Reading from databasechangelog
INFO 3/1/13 6:22 PM:liquibase: Reading from databasechangelog
INFO 3/1/13 6:22 PM:liquibase: Successfully released change log lock
INFO 3/1/13 6:22 PM:liquibase: Successfully released change log lock

==> /var/log/dcache/nfs-Domain.log <==

2013-03-01 18:21:52 Launching /usr/bin/java -server -Xmx512m -XX:MaxDirectMemorySize=512m -Dsun.net.inetaddr.ttl=1800 -Dorg.globus.tcp.port.range=20000,25000 -Djava.net.preferIPv4Stack=true -Dorg.dcache.dcap.port=0 -Dorg.dcache.net.tcp.portrange=33115:33145 -Dorg.globus.jglobus.delegation.cache.lifetime=30000 -Dorg.globus.jglobus.crl.cache.lifetime=60000 -Djava.security.krb5.realm=EXAMPLE.ORG -Djava.security.krb5.kdc=localhost -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.auth.login.config=/etc/dcache/jgss.conf -Djava.awt.headless=true -DwantLog4jSetup=n -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/dcache/nfs-Domain-oom.hprof -javaagent:/usr/share/dcache/classes/spring-instrument-3.1.1.RELEASE.jar -Ddcache.home=/usr/share/dcache -Ddcache.paths.defaults=/usr/share/dcache/defaults org.dcache.boot.BootLoader start nfs-Domain
01 Mar 2013 18:21:54 (System) [] Created : nfs-Domain

Now exit the tail command by pressing control+c.

This should show you that domains were created and no error messages should be flying around, so the output that tail shows should stop at some point after starting dCache although some output is normal to inform about successful domain creation.

First Contact

Now that we have a running dCache that does nothing we would like to make a first contact. This will be achieved by mounting dCache locally via NFSv41.

dCache needs several things to come together to be able to store data in it. It needs to authenticate and authorize people wanting to store data (gPlazma takes care of this), it needs a logical structure where file's names can be found in a directory structure, the namespace, which keeps file's meta data. The actual files are stored in a service that is called a pool that allows writing data to block devices and allows for a hierarchical storage management to be employed (More about this in the tertiary storage part of this hands-on).

dCache needs something that holds that actual data, the pools. So we first create a pool that is stored in a certain path in the local file system and set its maximum size. Execute the following command in your server console:

[root]# dcache pool create --size=419430400  --meta=db --lfs=precious   /pools/nfsPool nfsPool poolDomain
Created a pool in /pools/nfsPool. The pool was added to poolDomain in
file:/etc/dcache/layouts/isgc2013_ws.conf.

After we created the pool we need to start the domain that was created by executing:

[root]# dcache status
DOMAIN       STATUS  PID   USER
dCacheDomain running 22269 dcache
nfs-Domain   running 22313 dcache
poolDomain   stopped       dcache
[root]# dcache start poolDomain
Starting poolDomain done

The pool service will always make sure that there is enough space on the pool to allow for some data to be moved around. The amount of space that is kept free is configurable and by default it is 4 GiB. For this tutorial we need to set the gap to a very low value (10 MB) because the disk space on the hands-on machines is so limited. Do the following steps to change the gap size:

[root]# ssh -l admin -c blowfish -p 22223 localhost -1
... (local) admin > cd nfsPool
... (nfsPool) admin > set gap 10m
Gap set to 10485760
... (nfsPool) admin > save

Then exit the admin interface by pressing control+d. This now enables us to actually store files in dCache, but how? ... for example via a mounted NFSv41.

We need to set the NFS domain to make sure nfs server and client are inside the same namespace. This is done by adding the following line to the end of /etc/dcache/dcache.conf:

nfs.domain = taipei-domain

As mentioned dCache also needs a namespace where file's meta data is stored. We need to create a directory in the dCache namespace by executing:

[root]# chimera-cli mkdir /data

We also need to tell the nfs server that machines are allowed to mount certain directories that the dCache server provides access to. This is not really a dCache specific thing as you would also use the /etc/exports file for a non-dCache nfs servers. Please add the following line to /etc/exports:

/data localhost(rw)

This states that the directory /data can be accessed by localhost with read/write permissions. Now we will be able to see the benefit of having the nfsv41 service in a separate domain as we need to restart the domain for the change in /etc/exports to take effect.

[root]# dcache status
DOMAIN       STATUS  PID   USER   
dCacheDomain running 26853 dcache 
nfs-Domain   running 26903 dcache 
poolDomain   running 26959 dcache 

[root]# dcache restart nfs-Domain
Stopping nfs-Domain 0 1 2 done
Starting nfs-Domain done

Had we not had the nfsv41 service running in a separate domain we would have had to restart entire dCache, which can be very bad when you want to maintain a production dCache in service.

After this we can mount dCache locally by doing the following:

[root]# mkdir /nfs4
[root]# mount -o intr,minorversion=1 localhost:/data /nfs4

If you now switch to /nfs4 you can create some file e.g.:

[root]# vi /nfs4/myFirstNfsFile00001

Write something inside and quit vi after saving. Congratulations you wrote your first file into dCache. We will now mount nfsv41 on the client. Therefor we need to grant it permission on the server side. Therefore edit /etc/exports:

[root]# vi /etc/exports

and change the one line from

/data localhost(rw)

to

/data localhost(rw) <your client address>(rw)

then exit INSERT mode and save. This line now means that directory /data can be mounted with read-write permissions from localhost as well as <your client address>. After addition of this permission restart dCache:

[root]# dcache restart

Now go to your client:

(CLIENT) (CLIENT) (CLIENT) (CLIENT)

There you can have a look at:

[root]# less /etc/idmapd.conf

At the top of this file you will find:

[General]
#Verbosity = 0
# The following should be set to the local NFSv4 domain name
# The default is the host's DNS domain name.
Domain = taipei-domain

Remember, earlier we set the nfs.domain = taipei-domain in the dcache.conf on the server. This is now telling server and client to be in the same domain. If you had to change this value, which we luckily already did for you, you would have to restart the /etc/rpcidmapd service. But that is not necessary, now. You will also find that your client machine has a /nfs4 directory, which we created, so that all that is left for you to do is to mount and have a look at the mounted dCache and unmount:

[root]# mount -o intr,minorversion=1 <your server address>:/data /nfs4
[root]# ls /nfs4
...
[root]# umount /nfs4

Please note: For the nfsv41 mount to work on our client we need a nfs-utils installed, an SL6 kernel > 2.6.32-220.

What you have now is a running dCache server that has a mounted NFSv41 server locally that you can write to without any authentication. In this section we also created started a pool service with one pool, we created a directory /data that holds our data.

Interaction with dCache

(SERVER) (SERVER) (SERVER) (SERVER)

In this section we will look at the different ways of interacting with dCache to make use of its functionality. There is a multitude of ways to access dCache. For now we want to have a look at the simple and straight forward ways. One way you have seen in the chapter before - NFSv41. A different way to access dCache is WebDAV. You can do this without authentication as well, but we will use the so called dcache.kpwd file for authentication and authorization.

You will have to have a webDav service running so please add the following lines to the end of your layout file /etc/dcache/layouts/isgc2013_ws.conf:

[webdavDomain]
[webdavDomain/webdav]
[root]#  dcache status
DOMAIN       STATUS  PID  USER   
dCacheDomain running 6404 dcache 
nfs-Domain   running 6458 dcache 
poolDomain   running 6513 dcache 
webdavDomain stopped      dcache 

Now you need to start your webDavDomain:

[root]# dcache start webdavDomain
Starting webdavDomain done

Now you can go to your browser and have a look at the WebDAV page that dCache provides. Enter this address into the browser: http://<your server address>:2880. There you will see that you are unauthorized at the moment. This is what we will have to change now.

Check your WebDAV settings: enable the HTTP access, disallow the anonymous access, disable requesting and requiring the client authentication and activate basic authentication. Please do this by adding the following lines under the webdavDomain you just added in /etc/dcache/layouts/isgc2013_ws.conf:

webdavProtocol=http
webdavRootPath=/data
webdavAnonymousAccess=NONE
webdavWantClientAuth=false
webdavNeedClientAuth=false
webdavBasicAuthentication=true
webdav.redirect.on-read=false
webdav.redirect.on-write=false

so that it looks like this:

....

[webdavDomain]
[webdavDomain/webdav]
webdavProtocol=http
webdavRootPath=/data
webdavAnonymousAccess=NONE
webdavWantClientAuth=false
webdavNeedClientAuth=false
webdavBasicAuthentication=true
webdav.redirect.on-read=false
webdav.redirect.on-write=false

....

Next we need to take care of the authentication settings. There is one serivce in dCache that takes care of authorization and authentication - gPlazma. You alter setting of this service in the /etc/dcache/gplazma.conf Adjust the /etc/dcache/gplazma.conf to use the kpwd plug-in:

It will look something like this:

auth     optional kpwd
map      optional kpwd
session  optional kpwd

For more information see also the section called “Plug-ins” in the dCache book: Plug-ins.

The /etc/dcache/dcache.kpwd file is the place where you can specify the username/password record. It should contain the username and the password hash, as well as UID, GID, access mode and the home and fsroot directories:

# set passwd
passwd dcacheuser <some hash> read-write 20000 20000 / /

The passwd-record can be created by the dCache kpwd-utility. Please create it like this:

[root] # dcache kpwd dcuseradd -u 20000 -g 20000 -h /data/kpwd -r / -f / -w read-write -p test dcacheuser

You can now have a look at the dcache.kpwd file:

[root] # cat /etc/dcache/dcache.kpwd
...
# the following are the user password records
passwd dcacheuser 4390b3ee read-write 20000 20000 /data/kpwd /

Now you need to restart your webDavDomain:

dcache restart webdavDomain

Now point your web browser back at: http://<your server address>:2880 and you will be asked to enter username (dcacheuser) and password (test), as you specified in the kpwd file. Do so and you should see the file you created without authentication and authorization with nfsv41 before.

But this not enough yet to write to dCache using username password, because how should dCache know which directory is protected according to the entries in the dcache.kpwd file. We have to create a directory and give it the right protection. First creating the directory:

[root]# chimera-cli mkdir /data/kpwd

Then remember the entry that we created in the kpwd file. It associated UID 20000 with username dcacheuser using the password test. We now need to give the directory the right owner:

[root]# chimera-cli chown /data/kpwd 20000

If you now look at your browser you see the file written with nfsv41 and the directory kpwd. You can access both. If file names shall not be seen by others then do not write under webdav root /data but create a directory per user. Lets create a directory that can not be accessed by user dcacheuser:

[root]# chimera-cli mkdir /data/noAccess

and restrict access to it:

[root]# chimera-cli chmod /data/noAccess 744

No refresh your browser. You might need to reenter username and password, do so. Try to access the noAccess directory. Webdav will ask you for your username password. You can try to enter user dcacheuser and passwd test, but you will not gain access to this directory.

Let's write some file to our /kpwd directory. Go to your client machine. Where we will write a file into dCache using http and WebDAV with username and password.

(CLIENT) (CLIENT) (CLIENT) (CLIENT)

On the client execute:

[root]# curl -u dcacheuser:test -T /etc/hosts http://<your server address>:2880/kpwd/hosts.file0001

If you try the same with the noAccess directory you will fail. Now when you look into your browser again and refresh on http://<your server address>:2880/kpwd you will see the file you just wrote. Click on the file, the content is visible to you.

So, what did we do in this section? We started the dCache webdav service, create and protected a directory, created a user entry in the kpwd file and wrote a file to a protected directory from our client that we were allowed to access and one where we were not.

Certificates and dCache

Switch back to the

(SERVER) (SERVER) (SERVER) (SERVER)

This part will have a look at the security mechanisms that are used within the Grid, x509 certificates.

For certificate based access we need one certificate that identifies the server, one for the client and we need to know whether we can trust the authority that issued the certificate. The server certificate was installed on your machine in /etc/grid-security/. The client certificate is also already present in /home/dcacheuser/.globus/ on the CLIENT. The root and signing CA certificates were also installed on your machines already.

To use WebDAV with certificates you change the entry in /etc/dcache/layouts/isgc2013_ws.conf from the entries that were used for kpwd to the entries below:

[webdavDomain]
[webdavDomain/webdav]
webdavProtocol=https
webdavRootPath=/data
webdavAnonymousAccess=NONE
webdavWantClientAuth=true
webdavNeedClientAuth=false
webdavReadOnly=false
#webdavBasicAuthentication=true
webdav.redirect.on-read=false
webdav.redirect.on-write=false
#webdav.overwrite=false

Then you will need to import the host certificate into the dCache keystore using the command

[root] # dcache import hostcert
The host certifcate has been stored in /etc/dcache/hostcert.p12. If dCache
runs as a non-root user, you must change the owner of
/etc/dcache/hostcert.p12.

Here dCache tells you that the trust store /etc/dcache/hostcert.p12 needs to have the right ownership. Please execute:

[root] # chown dcache:dcache /etc/dcache/hostcert.p12

Then initialise your truststore by

[root] # dcache import cacerts

We will use a new directory to write and read files. Let's create it:

[root]# chimera-cli mkdir /data/certBased

We will also need to adapt the /etc/dcache/gplazma.conf file so it looks as follows:

auth     optional x509
#auth     optional kpwd
#map      optional kpwd
map      optional gridmap
map    sufficient  nsswitch
session  optional authzdb
#session  optional kpwd
identity optional nsswitch

We will also need to create two additional files, one /etc/grid-security/grid-mapfile that contains this one line mapping our users CN to a username:

"/C=DE/ST=Hamburg/O=dCache.ORG/CN=dcacheuserCert" dcacheuser

The other file that is needed is /etc/grid-security/storage-authzdb:

authorize dcacheuser read-write 20000 20000 / /data/certBased /

This file is needed to give the user permission to read from and write to the directory (/data/certBased) that we just created. Please do not think that the UID and GID that you see here are used for mapping. This is done by nsswitch.

There one thing left to be done. We created the mapping from CN to username above but the user dcacheuser does not exist yet on our machine. Let's also create this user and look for its UID, GID:

[root] # useradd dcacheuser
[root] # id dcacheuser
id=<UID>(dcacheuser) gid=<GID>(dcacheuser) groups=44024(dcacheuser)

Instead of <UID> and <GID> will be numbers. These you will need in the next step. Now we need to give the directory we created the right owner and group:

[root] # chimera-cli chown /data/certBased <UID>
[root] # chimera-cli chgrp /data/certBased <GID>

Now you also see why we needed to disable the kpwd plugin above as it has a different UID, GID for the user.

Now you need to restart the dCache as we have done changes to all components.

[root] # dcache restart

If everything went ok, we should now see the right permissions:

[root] # ls -la /nfs4/certBased/
total 2
drwxr-xr-x   3 dcacheuser dcacheuser  512 Mar  9 14:35 .
drwxr-xr-x 206 root       root        512 Mar  8 21:35 ..

Now go to your

(CLIENT) (CLIENT) (CLIENT) (CLIENT)

change to directory /home/dcacheuser/.globus:

cd /home/dcacheuser/.globus

There you find the following:

[root] # ls /home/dcacheuser/.globus

and we can start writing to our WebDAV:

[root] # curl  --key ./dcacheuserCert_key.pem  --cert ./dcacheuserCert_cert.pem -T /etc/passwd  https://<your server address>:2880/certBased/testFileWebDAV0001 -k

Let's also read the file that we just wrote:

[root] # wget  --private-key=dcacheuserCert_key.pem  --certificate=dcacheuserCert_cert.pem   https://vt-021.grid.sinica.edu.tw:2880/certBased/testFileWebDAV0001 --no-check-certificate

Do an less on the file you just got from dCache:

[root] # less testFileWebDAV0001

which shows you the content of the file. Congratulations, you have written a file using certificate based authentication. There is one peculiarity that we would like to mention. For UID and GID mapping in this hands-on we have used the /etc/passwd file. This is just for the purpose of this hands-on. Usually you would use an ldap or nis server for this, which dCache also supports.

What have we achieved in this part. We have configured our WebDAV in dCache in such a way that we can use x509 based certificates to read and write files. As a nice side effect of our configuration we can also see the ownership of the directory that we created in our nfsv41 mount.

Hardware Lifecycle (adding storage, migrating)

Back to our

(SERVER) (SERVER) (SERVER) (SERVER)

Here you will see part of how to maintain a dCache that is in production. This part will include adding storage, balancing data between different pools and migrating data. What happens quite often that new hardware is coming in to increase storage or old hardware needs to be decommissioned. In such cases dCache behaves very nicely as all of commissioning can be done without disturbing functionality of the system. The component that takes care of this is the so called migration module. What we will do in this part is to simulate such an every day process by adding a pool migrating data and decommissioning the old pool.

As a prerequisite for this part of the hands-on we will execute a script that copies 100 Files into dCache. We will explain later why.

[root] # cd
[root] # ./create100Files.sh
Createing file: 1
...

wait until the files are created.

Now we would like to create a new pool, you already know, this is done by only that we will create it in a new Domain:

[root]# dcache pool create --size=419430400 --meta=db --lfs=precious  /pools/newPool newPool newPoolDomain
Created a pool in /pools/newPool. The pool was added to newPoolDomain in
file:/etc/dcache/layouts/isgc2013_ws.conf.

Then check for the status:

[root] # dcache status
DOMAIN        STATUS  PID   USER
dCacheDomain  running 13280 dcache
nfs-Domain    running 13331 dcache
poolDomain    running 13390 dcache
webdavDomain  running 13440 dcache
newPoolDomain stopped       dcache

and start the newPoolDomain:

[root] # dcache start newPoolDomain

Now we also have to set the Gap for this pool again:

[root] # ssh -l admin -p 22223 -c blowfish localhost -1
admin@localhost's password:

Enter the password that you will get from us during the handson.

Then go to the newPool, set the gap, save:

[...] (local) admin > cd newPool
[...] (newPool) admin > set gap 10m
[...] (newPool) admin > save

Now lets have a look at the new pool and how many files it has:

[...] (newPool) admin > rep ls

As you see rep ls does not produce any output, because the pool does not have any files. We also need to tweak the pool a bit because it is so unusually small that the rebalance will not work if we do not do this.

[...] (newPool) admin > set heartbeat 1
[...] (newPool) admin > save

But where are our files that we already created in dCache? Yes, they are on the other pool. Let's have a look. We will leave the newPool by "..", go to the nfsPool have a look at how many files are there:

[...] (newPool) admin > ..
[...] (local) admin > cd nfsPool
[...] (nfsPool) admin > rep ls
00001800A57468AB4089BE143EACFE6638BE <-P---------L(0)[0]> 5 si={<Unknown>:<Unknown>}
00002743423F612B4812878765800179A0D6 <-P---------L(0)[0]> 187 si={<Unknown>:<Unknown>}
0000AAB5BB5D3BED442C96C148D5EB06B9C9 <-P---------L(0)[0]> 187 si={<Unknown>:<Unknown>}
000044B52AA348794896B7BBE502CC27A3B0 <-P---------L(0)[0]> 187 si={<Unknown>:<Unknown>}
000091D1AD387C2B4F909746F04BC4B74870 <-P---------L(0)[0]> 0 si={<Unknown>:<Unknown>}
...

We also need to tweak the nfsPool:

[...] (nfsPool) admin > set heartbeat 1
[...] (nfsPool) admin > save

The rep ls shows you that there are many files on the nfsPool. However having such a unbalance in number of files is undesirable as one pool will do all the work while the other one sits around waiting for files to arrive. Let's take care of this by balancing the files. Therefore you need to leave the pool and go to a service that is called PoolManager:

[...] (nfsPool) admin > ..
[...] (local) admin > cd PoolManager
[...] (PoolManager) admin >

Pools are always organized in so called poolGroups. To see which p

[...] (PoolManager) admin > psu ls pgroup -l
default
 linkList :
   default-link  (pref=10/10/-1/10;;ugroups=3;pools=1)
 poolList :
   nfsPool  (enabled=true;active=17;rdOnly=false;links=0;pgroups=1;hsm=[];mode=enabled)
   newPool  (enabled=true;active=12;rdOnly=false;links=0;pgroups=1;hsm=[];mode=enabled)

This shows you how many poolGroups we have and we see that this one poolGroup (with name default) has two pools nfsPool and newPool. We will trigger the rebalance of files on the poolGroup, which has the effect that files will be distributed amongst the two pools.

Let's trigger the replication of files between pools:

[...] (PoolManager) admin > rebalance pgroup -refresh=1 default

The rebalance command distributes the files evenly amongst pools in the pool group. After this step we have the files distributed between nfsPool and newPool. If you check the pools you will see that what we expected did not happen. Instead all the files are on the newPool. What happened? The problem we have in our hands-on scenario is that the pools are simply too small. The rebalance does not work properly on such small amounts of data. Usually such a rebalance takes hours with huge pools. We never the less wanted to show this here, so you know this functionality exists.

Back on track: Remember our goal was to decommission the nfsPool. Therefore we need to migrate the files away from it. First we should set the pool we want to empty to read-only so no new files are written to it,

[...] (PoolManager) admin > ..
[...] (local) admin > cd nfsPool
[...] (nfsPool) admin > pool disable -rdonly
Pool nfsPool disabled(store,stage,p2p-client)

After this we want to migrate the data completely to the new pool (newPool).

[...] (nfsPool) admin > migration move newPool

We can watch the progress of the migration process by:

[...] (nfsPool) admin > migration info 1
Command    : null
State      : FINISHED
Queued     : 0
Attempts   : 177
Targets    : newPool
Completed  : 177 files; 36462 bytes
Total      : 36462 bytes
Concurrency: 1
Running tasks:

In our case of course nfsPool does not contain any files anymore, but if it did, all files would have been moved to newPool. When the state is FINISHED as can be seen from the output above the migration process is finished and we can disable the pool completely.

[...] (nfsPool) admin > ..
[...] (local) admin > cd PoolManager
[...] (PoolManager) admin > psu set pool nfsPool disabled
[...] (PoolManager) admin > psu removefrom pgroup default nfsPool
[...] (PoolManager) admin > save
[...] (PoolManager) admin > ..
[...] (local) admin > logoff

What have we done? We have created a new pool, have rebalanced files on the pools in a pool group, then moved the files away from a pool (nfsPool) we wanted to decommission and finally disabled this pool. This we did without interrupting the availability of any files to the users. Isn't this cool?

Interfacing with Tertiary Storage

Finally we will look at how dCache can be used to access archive media (tape archives). In our scenario we have all dCache services running on one machine. Detail information how dCache interact with archive media can be found in the dCache book The dCache Tertiary Storage System Interface.

For this part we prepared a script, which emulates an archive media mounted on the local directory /hsmTape/data on our server.

[root]# mkdir -p /hsmTape/data
[root]# chown -R dcache /hsmTape/data

This script supports 3 types of operations - PUT, GET and REMOVE . In the /usr/share/dcache/lib/ you can also find additional examples of hsm scripts. Have a look at them later if you like.

[root]# mv /root/hsmscript.sh /usr/share/dcache/lib/hsmscript.sh
[root]# cat /usr/share/dcache/lib/hsmscript.sh
.....
lot of lines
.....

We link this script to pool newPool which means each file incoming on this pool will be archived to our tape mounting point. Login to the admin console:

[root] # ssh -l admin -p 22223 -c blowfish localhost -1
admin@localhost's password:

to change "hsm" setting doing:

[...] (local) admin > cd newPool
[...] (newPool) admin > hsm set osm osm
[...] (newPool) admin > hsm set osm -command=/usr/share/dcache/lib/hsmscript.sh
[...] (newPool) admin > hsm set osm -hsmBase=/hsmTape

set maximum simultaneously active "GET" operations:

[...] (newPool) admin > rh set max active 5

set maximum simultaneously active "PUT" operations:

[...] (newPool) admin > st set max active 5

save pool configuration

[...] (newPool) admin > save

allow stage operations for PoolManager

[...] (newPool) admin > ..
[...] (local) admin > cd PoolManager
[...] (PoolManager) admin > pm set -stage-allowed=yes 
[...] (PoolManager) admin > save
[...] (PoolManager) admin > ..
[...] (local) admin > logoff

modify layout file /etc/dcache/layouts/isgc2013_ws.conf add cleaner manager service to the namespace domain changing:

...
[nfs-Domain]
[nfs-Domain/nfsv41]
....

to

....
[nfs-Domain]
[nfs-Domain/nfsv41]
[nfs-Domain/cleaner]
cleaner.hsm = enabled
....

clear lfs parameter for pool

[newPoolDomain]
[newPoolDomain/pool]
name=newPool
path=/pools/newPool
maxDiskSpace=419430400
metaDataRepository=org.dcache.pool.repository.meta.db.BerkeleyDBMetaDataRepository
waitForFiles=${path}/data:${path}/meta
lfs=precious

from precious to none

[newPoolDomain]
[newPoolDomain/pool]
name=newPool
path=/pools/newPool
maxDiskSpace=419430400
metaDataRepository=org.dcache.pool.repository.meta.db.BerkeleyDBMetaDataRepository
waitForFiles=${path}/data:${path}/meta
lfs=none

Define the storage class by assigning tags it to the mounted namespace

[root]# echo "StoreName store" | /usr/bin/chimera-cli writetag /data OSMTemplate
[root]# echo "sGroup" | /usr/bin/chimera-cli writetag /data sGroup

We will use a small trick to let dCache regenerate the storage class information. Therefor we stop dCache:

[root]# dcache stop

In order to renew storage class information we have to delete all files in service directory /pools/newPool/meta/*, don't worry after restart this information will automatically regenerated, with right storage class information added to it:

[root]# rm -f /pools/newPool/meta/*

All preparatinons are done. We just have to start dCache and check how the tape backend works:

[root]# dcache start

if you monitor directory: /hsmTape/data after couple of minutes you will see new coming files on it.

[root]# ls /hsmTape/data

if you don't have a time to wait you can always trigger the "put" signal from the pool in admin console:

[root] # ssh -l admin -p 22223 -c blowfish localhost -1
admin@localhost's password:
[...] (local) admin > cd newPool
[...] (newPool) admin > rep ls 
0000D3C011014D3A4369A5B2C56F38C857A3 <-P---------L(0)[0]> 410 si={ourStore:sGroup}
0000CAC412AA52724788AD7458610C8109C3 <-P---------L(0)[0]> 351 si={ourStore:sGroup}
00009205E1A10DC8438491703A89E0C5F431 <-P---------L(0)[0]> 424 si={ourStore:sGroup}
0000EE4558BF729E48F7A7C6FC561674C385 <-P---------L(0)[0]> 826 si={ourStore:sGroup}
.....
some lines here 
.......

In order to flush a file to the tape run the command flush pnfsid.

[...] (newPool) admin >  rep ls 0000EE4558BF729E48F7A7C6FC561674C385
0000EE4558BF729E48F7A7C6FC561674C385 <-P---------L(0)[0]> 826 si={ourStore:sGroup}
[...] (newPool) admin > flush pnfsid 0000EE4558BF729E48F7A7C6FC561674C385
Flush Initiated
[...] (newPool) admin >  rep ls 0000EE4558BF729E48F7A7C6FC561674C385
0000EE4558BF729E48F7A7C6FC561674C385 <C----------L(0)[0]> 826 si={ourStore:sGroup}
[...] (newPool) admin >  pf 0000EE4558BF729E48F7A7C6FC561674C385
/data/bigFile42

As you see, the mode bits has changed from -P----------L(File is precious, i.e., it is only on disk) to C----------L ( File is on tape and only cached on disk.) Now in order to see how dCache actually restores files from tape we will clean all cached files from newPool,

[...] (newPool) admin > sweeper purge
Reclaiming 79680000 bytes
[...] (newPool) admin >  rep ls
[...] (newPool) admin > 

Now use your favorite method to read files from dcache f.e

[root]# md5sum /nfs4/bigFile42
ba966d50095309f930a489f7b2551d0d  /nfs4/bigFile42

check your pool again :

[root] # ssh -l admin -p 22223 -c blowfish localhost -1
admin@localhost's password:
[...] (local) admin > cd newPool
[...] (newPool) admin > rep ls 
0000EE4558BF729E48F7A7C6FC561674C385 <C-------X--L(0)[0]> 826
[...] (newPool) admin >

The last piece to prove how dCache removes data from tape. Let's clean up /nfs4/ and after some times ( 40-120 seconds) read your faked tape

[root]# rm -rf /nfs4

... Let some time pass ...

[root]# ls  -R /hsmTape/data

So what have we achieved in this part. We have created an hsm script. Assigned it to a pool so that whenever it receives files they will be flushed to tape. We have checked what happens when we try to read files that are on tape-only by deleting files from the pool and trying to read the file from the nfsv41 mount. Thus we saw that dCache fetched the file back from tape onto the pool. This will happen every time someone requests a file that is on tape but not on any pool in dCache.

Try youself to wite some files on dcache ( overfill the tape pool) read and delete some data. Play around with the knowledge you just gained.

We really hope you enjoyed experiencing dCache and feel free to fire away with any questions you have.

Thank you very much.