wiki:manuals/FAQ
Last modified 5 years ago Last modified on 06/18/12 12:39:05

dCache FAQ

Trouble Shooting

all srmcp commands fail with the error "Subject and issuer names do not match" if you enable debugging.

This is because your host certificate was requested using internet explorer.

Inside of the hostcert.pem two different certification keys maybe listed. One for the host and one for a human beeing. This format maybe was generated as the Internet Explorer was used in the course of generating the certificate. After we removed the second key, the error about Subject and issuer names do not match" is gone.

I keep getting kernel errors form the PNFS NFS server on mounted clients

Examples include

 nfs_update_inode: inode number mismatch
expected (0:15/0x10bc35f), got (0:15/0x10bc358)

These are not to be worried about.

dcache gridftp to a globus-gridftp server (and vice-versa) fail as 3rd party transfers

Dcache does not support data channel authentication so you need to add the option.

-nodcau | -no-data-channel-authentication
           turn off data channel authentication for ftp transfers

Low Level Exc : java.io.IOException: Too many open files

The usageInfo webpage will sometimes display this message:

pool1_05  pool1Domain  [204]  Low Level Exc : java.io.IOException: Too many open files

or the

04/03 11:04:26 Cell(GFTP-se3-goegrid@gridftp-se3-goegridDomain) : Got an IO Exception ( closing server ) : java.net.SocketException: Too many open files
04/03 11:04:26,948 Socket adapter /0.0.0.0:24161: Too many open files

It should be possible to fix this issue by increasing the maximum number of available processes for the host. this is done by creating a file that is sorced when each domain starts.

$cat /opt/d-cache/jobs/dcache.local.sh
ulimit -n 32768

I found that the dCache domain can not accept new connections is because we already have many connections

The OS typically has a default limit of 1024 file descriptors, since you need a file descriptor per connection, this is a likely source for your problem.

The command

[root@baden ~]# ulimit -a 
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 4160
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 32000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4160
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Shows you the system limits. It is posible to set the limit as /etc/security/limits.conf

echo "* soft nofile 32000" >> /etc/security/limits.conf 
echo "* hard nofile 42000" >> /etc/security/limits.conf 

This will then be set on reboot. To set it interactively

 ulimit -n 42000

pnfsDomain's Kernel emits "nfs_update_inode: inode number mismatch" errors

The error's of the type

nfs_update_inode: inode number mismatch
expected (0:15/0x10bc35f), got (0:15/0x10bc358)

are not to be worried about.

/pnfs/fs: device is busy problem

Try to debug and fix pnfs daemon problem, If you sure it's not postgres correlated. Try following steps:

  • Stop dcache and pnfs
  • Backup postgres(admin,data1)
     -pg_dump -U pnfsserver admin >admin.dump
     -pg_dump -U pnfsserver data1 >data1.dump
    reinsall pnfs 
     -dropdb -U pnfsserver admin 
     -dropdb -U pnfsserver data1
     -change /PNFS_OVERWRITE = yes in /opt/pnfs/etc/pnfs_config
     -run /opt/pnfs/install/pnfs-install.sh
     -change /PNFS_OVERWRITE = no in /opt/pnfs/etc/pnfs_config(very important)
    recovery postgres database
     -dropdb -U pnfsserver admin
     -dropdb -U pnfsserver data1
     -createdb -U pfnsserver admin
     -createdb -U pnfsserver data1
     -psql -U postgres admin<admin.dump
     -psql -U postgres data1<data1.dump
    

start your pnfs daemon

PNFS has stopped logging to pnfsd.log.

PNFS cannot log to files bigger than 2GB. You should configure logrotate to periodically rotate the log files to avoid that they become this big.

After an update SRM does not work any more.

While our RPMs does not have 'active' scripts inside, you have to run install.sh to finish update.

I can Write but cant get the files back

The main difference between read and write, that write is an passive transfer, e.g. on write client connects to the door for a data transfer, while on read pool connects to. Please check your firewall configuration, it might help to drop the firewall while testing this.

If this is not the answer

Please check /var/log/pnfsDomain.log for errors. Check that 'companion ' database populated, log in to the data base.

companion=# select * from cacheinfo where
pnfsid='003C00000000000001895498';
pnfsid | pool | ctime
--------------------------+------------+-------------------------
003C00000000000001895498 | dcache21_1 | 2006-02-22 01:35:21.839
(1 row)

TURL returns internal interface.

By default, doors bind to all available interfaces and SRM chooses one of them. Please check 'Complex Network Configuration' chapter of 'The Book'.

TURL returns 127.0.0.x address.

Please check that /etc/hosts does not contains something like:

# /etc/hosts
127.0.0.1 localhost
127.0.0.2 myhost # bad entry

I am unable to force srmcp to use dcap, nevertheless, plain dcap works fine.

To avoid dcap on wide area transfers dcap doors are not published to SRM.

I getting an error: “No storageInfo available” on writing to a directory

Please check that in the destination directory tags 'sGroup' and 'OSMTemplate' exists and not empty.

# cat '.(tag)(sGroup)' store:group@osm
group
# cat '.(tag)(OSMTemplate)''
StoreName store

which coresponds with

store:group@osm

Some transfers hangs in 'Tape Restore Queue' but thier is no Tape/HSM, system

File located on a pool which is off-line or was off-line at request time. Check that file is reregistered by companion. Use retry in PoolManager after pool is back.

Since our upgrade from 1.7.0-38 to 1.8.0 no transfers are working, doors are up and running but I get the following error.

java.lang.IllegalArgumentException: Protocol not found : GFtp/1
        at diskCacheV111.poolManager.RequestContainerV5.addRequest(RequestContainerV5.java:604)
        at diskCacheV111.poolManager.PoolManagerV5.messageArrived(PoolManagerV5.java:621)
        at dmg.cells.nucleus.CellAdapter.messageArrived(CellAdapter.java:940)
        at dmg.cells.nucleus.CellNucleus.run(CellNucleus.java:450)
        at java.lang.Thread.run(Thread.java:595)
....
java.lang.IllegalArgumentException: Protocol not found : DCap/3
        at diskCacheV111.poolManager.RequestContainerV5.addRequest(RequestContainerV5.java:604)
        at diskCacheV111.poolManager.PoolManagerV5.messageArrived(PoolManagerV5.java:621)
        at dmg.cells.nucleus.CellAdapter.messageArrived(CellAdapter.java:940)
        at dmg.cells.nucleus.CellNucleus.run(CellNucleus.java:450)
        at java.lang.Thread.run(Thread.java:595)

the solution is to add to PoolManager.conf

psu create unit -protocol */*

Tuning

Is there a way to control log files size ?

logrotate is a popular utility for manging the log files, in adition dCache is implementing more modules with log4j.

I want to move a data disk from one pool to another.

after attaching the disk to the new hosts:

  • adjust x.poollist file;
  • start the pools;
  • run pnfs register in the new pool or
# UPDATE cacheinfo SET pool='<new pool name>' WHERE
pool='<old pool name>';

One of my data disks has died and I want to know what files I lost

in companion database:

# SELECT * FROM cacheinfo WHERE pool='<dead pool>';

I want to switch the 'companion' database on ?

  • In the dCacheSetup file set the following field
    cacheinfo=companion
    
  • create a database called companion:
    psql -U postgres -O srmdache companion
    
  • initialize companion:
    psql -U srmdache < /opt/d-cache/etc/psql_install_companion.sql
    
  • run pnfs register on all pools;
  • restart pnfsManager

I want to migrate PNFS to a different host

For pnfs-postgres

  • install the pnfs on the new host
  • shutdown old pnfs
  • dump postgres databases with
    pg_dumpall > pnfs.dump
    
  • copy /opt/pnfsdb to the new host
  • restore postgres database with
    psql -U postgres template1 < pnfs.dump
    
  • start the new pnfs

For pnfs-gdbm

  • install the pnfs on the new host
  • shutdown old pnfs
  • copy /opt/pnfsdb to the new host
  • start the new pnfs
  • Consider moving to pnfs-postres

I want to move a dCache component to a different host

dCache supports a the node type 'custom'. Please be sure that set of all services you want to run on each host explisitly.

lm, dCache, adminDoor, pnfs, httpd, utility Service locator should point to lm host.

do I need to mount pnfs on all host ?

pnfsDomain, dirDomain and gridftp-doors requires pnfs to be mounted. In addition, all HSM backhanded write pools have to mount pnfs as well. Resent releases do not require the HSM backend write pools to mount pnfs and we are working on removing the requirement for gridftp-doors.

Why is my file in SUSPENDED mode?

A file may be in SUSPENDED mode for more than one reason.

  • Pool is down
  • File not in dCache

Due to the internal logic of dCache its difficult to distinguish between these two cases, please check all your pools are available regularly.

Which TCP ports used by dCache?

You can set them,

50000:52000 => gridftp in pools 33115:33145 => passive dcap & xrootd 8443 => SRM 22223 => ssh 22125 => dcap 22128 => gsidcap 1049 => xrootd 2811 => gridftp

A lot of "Can't read setup file: exists? false can read? false" in dCache pool log files.

dCache pools with a lot of repetitions of these message in the log file:

11/15 09:59:04 Cell(${0}@uct2-c001Domain) : New Pool Mode : disabled(fetch,store,stage,p2p-client,p2p-server)
11/15 09:59:04 Cell(${0}@uct2-c001Domain) : Can't read setup file: exists? false can read? false

But have a pool running OK, can be casued by a trailing blank line in the

/opt/d-cache/config/`hostname`.poollist file

If there are blanks in this file, instead of being ignored, it causes dCache to start a Java cell with the name ${0}

This is not a fatal error, but its a bit confusing, and very non-Unix-like to not just ignore blank lines and #comments.

Thanks to Charles Waldman for pointing this out.

Basics of postgresql

Please use a mimimum postgresql version of 8.1.0, we generally recommend running with the latest version. Postresql uses a lot of IO operatiuons so its best to uses as many disk spindles as possible RAID 5+1 or 6 is recommended. For best performance keep <PGDATA>/data and <PGDATA>/data/pg_xlog on different disks.

mv pg_xlog /xLog/pg_xlog
ln -s /xLog/pg_xlog pg_xlog

in postgres.conf

stats_start_collector = on
stats_command_string = on
stats_row_level = on
autovacuum = on
autovacuum_naptime = 600
autovacuum_vacuum_threshold = 1000
autovacuum_analyze_threshold = 500

Some tuning is host specific, an example fo Linux,PostgreSQL Server with 4GB of RAM

shared_buffers = 16384
sort_mem = 4096
vacuum_mem = 32168
work_mem = 8192
max_fsm_pages = 1000000
effective_cache_size = 262144
random_page_cost = 2

Change the kernel parameter for maximum shared memory segment size in /etc/sysctl.conf

kernel.shmmax = 288435456

rep ls file flags for cached, sticky, precious ?

What has changed between 1.7 and 1.8 is that both tape and disk only files previously were marked precious; in 1.8 tape files are precious when they are not on tape yet, and REPLICA+ONLINE files are marked cached+sticky (the C+X combination). If you have not configured space management (or rather, haven't configured the proper tags for this in PNFS), disk only files are still precious. Files that were migrated to tape, recalled from tape, or copied to other pools due to load are cached (i.e. without the X).

The sticky flag (the X in rep ls) is really a list of sticky flags. As soon as the list is non-empty, the file is considered sticky (i.e. the X is set) and will not be garbage collected. Each sticky flag in the list has an ID (aka owner) and a lifetime. The lifetime may be infinite (that's when you don't specify it). For REPLICA+ONLINE files the owner is 'system' and the lifetime is infinite (i.e. -1).

Passive gridftp: Passive transfer is the default value on the clients, but passive isn't allowed on the server by default

The option -allowPassivePool=false disallows the pool to be passive, i.e., incoming connections to the pool are proxied through the door. Transfers where the pool is the active party, i.e. the pool creates an outgoing connection, are still created directly. At many sites outgoing connections will work just fine, but ingoing connections are blocked.

-allow-relay is the option that also forces outgoing connections to be proxied through the door (apologies for the inconsistent naming).

Be aware that this obviously makes your FTP doors a bottleneck and you should have plenty of doors. Depending on your network topology it also creates extra load on your network.

There may be good reasons to also alter the passive pool setting, but those are completely unrelated to whether you want to perform a passive FTP transfer. If you decide to change the passive pool setting, then please make sure that you are aware of all the implications and ensure that your routing and firewall setup does not interfere.

To use passive ftp for reads, you have to limit the stream number to one (due to a limitation of the GridFTP protocol). To do this, use

srmcp -streams_num=1 -server_mode=passive

srmcp fails with "Invalid certificate chain at 'XYZ' certificate. Subject and issuer names do not match"

One possible reason was reported by our customers. Here is a quote from their email: It seems we have found the reason for the failure: Inside of the hostcert.pem two different certification keys were listed. One for the host and one for a human beeing. This format maybe was generated as the Internet Explorer was used in the course of generating the certificate. After we removed the second key, the error about " Subject and issuer names do not match" is gone.

What is a PNFS Folding?

PnfsManager assigns requests to the queue by hashing the path or PNFS ID. Thus requests for the same file system object are guaranteed to go to the same queue. This ensures that no two operations modify the same name space entry at the same time.

One consequence is that if many requests are submitted for the same file system entry, they all go to the same queue.

Since the queues are not meant for load balancing, but to ensure serialized access to name space entries, a slow query (for what ever reason) will block other requests in the same queue. In the worst case those requests time out and are resubmitted by the door or user, thus causing the queue to grow.

There is an essential feature in PnfsManager that, when enabled, kicks in in these situations: request folding. It makes sure that two or more identical requests in the queue are collapsed into a single request. This greatly improves how quickly the system recovers from such problems.

The feature is disabled by default in 1.9.5, and I see from the info output you provided that you likely did not enable it.

Please set pnfsFolding=true in dCacheSetup and restart the pnfs manager. That should allow the system to recover much more quickly.

What is exactly the definition of 'active' space token? Why does space token (1234567 | AB-CD) is set inactive, whereas the space token (7654321 | EF-GH) is active? They are in the same situation: both contain data, that should be accessed only for reading (not for writing). The only difference is that AB-CD was T1D1 whereas EF-GH was T0D1. Is this the reason?

Space reservations are not "active" or "inactive". From the point of view of a user (and the SRM specification) a reservation either is or is not.

Space reservations do however not live for ever. They may be released by the user or the admin and they may expire. The state field you found in the database indicate such cases: State 1 means that the space was released (either by an admin or a user) and state 2 means the space had a limited lifetime and has now expired.

From the point of view of the user/client a released or expired reservation doesn't exist anymore. It is simply a dCache internal detail that we retain the information about the space even after it was released or expired. From the point of the view of the user those files *are not* in a space reservation. They used to be, but they are no longer.

The fact that the reservations are flagged as release also gives you a hint as to what happened. Either you or one or your colleagues ran the 'release' CLI command in SrmSpaceManager, or one of your users released the reservation through SRM (with the dCache clients this would be the srm-release-space tool).

Regarding having multiple reservations for the same space token description, this is simply how the SRM spec is defined. Space tokens are defined by a identifier generated by the server (in case of dCache this is an integer). The space token description is a user defined string that helps describe what the space token is for. The description (like you would expect from a description) has no obligation to be unique. Some clients allow the user to specify the space token description and then the client will try to find a space token that has that description - this is however a client feature. At the SRM level the client has to specify a space token identifier - not a description.

I upgraded our production dCache instance from 1.9.5.-16 to 1.9.12-10. The old one was using Postgres-8.3.7 the new one Postgres-9.0. I used the Postgres-9.0 client to dump the old databases and then restore them. After upgrading the headnodes and checking that SrmSpaceManager > ls shows correct information I upgraded one pool and started it. Then I tried to read one file from that pool and it fails with:

srmcp -2 -globus_tcp_port_range=20000,25000 file:////tmp/test srm://abcdef:8443/xyz/path/testfile

Tue Sep AB XX:YY:ZZ CEST 2011: java.io.IOException: srmPrepareToPut submission failed, unexpected or failed status : SRM_INTERNAL_ERROR explanation= internal error: Exception [TOPLINK-4002] (Oracle TopLink Essentials - 2.1 (Build 60f (01/07/2009))): oracle.toplink.essentials.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "authgrouplist_pkey"
Error Code: 0
Call: INSERT INTO AUTHGROUPLIST (id, ATTRIBUTE, authrecord_id) VALUES (?, ?, ?)
bind => [644, /xyz/test, 7968263235394437071]
Query: InsertObjectQuery(GL:/xyz/test 0 groups : [])
srm client error:
java.lang.Exception: stopped
Exception in thread "main" java.lang.Exception:  stopped 
        at gov.fnal.srm.util.Copier.run(Copier.java:244)
        at java.lang.Thread.run(Thread.java:636)

This is a common problem that several users have reported. We have no idea why this happens, but is caused by a PostgreSQL sequence object being reset.

If you moved the DB while upgrading then maybe the sequence object wasn't moved. That was the case for some users. Other users had not moved the DB and still suffered from the problem.

The solution is to either manually increase the sequence object, or to delete the content of those tables. The latter is the simpler solution.

Delete the content of authgrouplist and authgroup or simply drop the tables:

drop table AUTHGROUPLIST cascade;
drop table AUTHGROUP cascade;

There's been some confusion over ACLs, Chimera and migrating from PNFS. In particular, enabling ACL support for an instance that migrated from PNFS can lead to a degradation of Chimera performance.

Symptom: poor Chimera performance, with slow queries involving 't_acl' table.

Preconditions: the namespace was migrated from PNFS to Chimera and ACL support is enabled (if either is not true then your instance is not affected).

Cause: The 't_acl' table was not updated during PNFS migration process.

How to check if an instance is effected:

Run the following command:

	psql -U chimera -A -c "SELECT \
		pg_catalog.format_type(a.atttypid, a.atttypmod) \
		FROM pg_catalog.pg_attribute a, pg_catalog.pg_class c \
		WHERE ((a.attname = 'rs_id' AND c.relname = 't_acl') OR \
		(a.attname = 'ipnfsid' AND c.relname = 't_inodes')) \
		AND a.attrelid = c.oid AND a.attnum > 0 AND NOT a.attisdropped \
		ORDER BY c.relname;"

You should see one of three possible outputs:

a) output for non-migrated Chimera:

		format_type
		character(36)
		character(36)
		(2 rows)

If you see this output then do NOT take any action. Altering the table (as described below) will CREATE the problem, rather than fixing it.

b) output for PNFS-migrated Chimera:

		format_type
		character(36)
		character varying(36)
		(2 rows)

This output indicates that you have a PNFS-migrated Chimera. Enabling ACL support will result in degraded performance.

c) output for PNFS-migrated Chimera with fix:

		format_type
		character varying(36)
		character varying(36)
		(2 rows)

This output indicates that you have a PNFS-migrated Chimera with the fix for ACL support. Enabling ACL support should not greatly affect performance[*].

Recommended action:

Run the above command.

If you see output a) or output c) above then you are not affected. DO NOTHING. This is especially important for output a).

If you see output b) then your choice of action depends on whether ACL support is enabled and, if not, whether you are likely to enable ACL support in the future.

o if you are using ACLs then you should fix the problem. The problem may be fixed by running the following command:

		psql -U chimera -c "ALTER TABLE t_acl ALTER rs_id TYPE VARCHAR(36);"

This command may be run on a live system, without taking the site into down-time.

o if you are not using ACLs then you are not suffering from the problem. You may still choose to fix the problem since doing so has little impact and will allow you to use ACLs in the future with minimum of performance impact[*].

[*] - supporting ACLs requires additional database lookups. Therefore, it is inevitable that there is some (however tiny) performance penalty for doing this. With the connection pooling and pre-computed SQL statements, the overhead should be too small to measure reliably.

About

This FAQ complied from e-mails from:

support@… user-forum@…

And also started from a presentation by Tigran for the dCache workshop, and the GridPP WIKI.

There is two major category of Questions: Troubleshooting and Tuning

Extra FAQ

GridPP FAQ