Last modified 9 years ago Last modified on 03/09/12 11:29:25

Trouble Shooting Guide for dCache

dCache version Error Problem Solution
all ERROR: role "chimera" already exists When setting up the postgres database and something goes wrong using the commands described in the installation documentation. Reissuing commands such as 'createuser -U postgres --no-superuser --no-createrole --createdb --pwprompt chimera' will result in 'createuser: creation of new role failed: ERROR: role "chimera" already exists' dropdb -U postgres chimera && su postgres && dropuser chimera, afterwards you can start over creating the database and user, etc.
>1.9.12 In the log files you will find: (666) file:/usr/share/dcache/services/pool.batch: line 100: (3) java.util.concurrent.ExecutionException?: org.springframework.beans.factory.BeanCreationException?: Error creating bean with name 'lock' defined in class path resource [org/dcache/pool/classic/pool.xml]: Invocation of init method failed; nested exception is /pools/pool1/lock (Permission denied) from ac_create_$_2_3 The rights for the pool are not properly set. Changing ownership of pools to dcache user solves the problem. (chown dcache /pools/pool1/ in this case)
2.1.0 SL 6 (666) file:/usr/share/dcache/services/nfsv3.batch: line 2: (666) file:/usr/share/dcache/services/nfsv3-chimera.batch: line 21: (3) java.lang.RuntimeException?: (3) Failed to create bean 'door' : Permission denied from ac_create_$_2_3 dCache restarting with permission denied while starting up What is missing is rpcbind and a line in /etc/sysconfig/rpcbind If the file does not exist create it and insert RPCBIND_ARGS="-i"
all, at occurrence 1.9.12 26 Apr 2011 11:40:36 (SRM-xen-ep-emi-tb-se-3) [] org.globus.common.ChainedIOException: Authentication failed [Caused by: Failure unspecified at GSS-API level [Caused by: Bad certificate (The signature of 'C=CH,O=CERN,OU=GD,CN=Test user 100' certificate does not match its issuer) Bad certificate (The signature of 'C=CH,O=CERN,OU=GD,CN=Test user 100' certificate does not match its issuer) The problem was solved by putting the issuing CA's certificates in /etc/grid-security/certificates
all, especially 1.9.5 SRMClientV2 : srmPrepareToPut: try # 0 failed with error SRMClientV2 : ; nested exception is: Tue Apr 26 14:43:40 CEST 2011: Tue Apr 26 14:43:40 CEST 2011: stopping copier going to stop.... srmcp causes a EOF exception. The actual problem can lie in a CA's certificate revocation list (crl) being out of date. check if crl of the CA that issued the user certificate is out of date. Find out the issueing CA: openssl x509 -in ~/.globus/<usercert>.pem -noout -issuer, then grep for the relevant crl on the storage element (SE). This will give you a name: something like: 1cda0759.r0 This command can then be used to check if the crl is still valid: openssl crl -in 1cda0759.r0 -noout -text
at occurence 1.9.12 11 May 2011 12:28:56 (alm) [] Auth (knownUsers) : Ssh knownUsers unavailable for request from User admin Host SshStreamEngine? : SSH_CMSG_AUTH_RSA : Key not found
all $ ssh -c blowfish -p 22223 -l admin <host> --> respond_to_rsa_challenge: server_key 768 < host_key 768 + SSH_KEY_BITS_RESERVED 128 generate new host_key and server_key and cleanup your known_hosts file, then retry after admindoor restart
all in /var/log/dCacheDomain.log: (poolCollector) [] Failed to update topology map: {uoid=<1313051194606:125>;path=[>PoolManager@local];msg=Missing routing entry for PoolManager@local} messages can not get to the cell This can happen at startup when a cell is expected to be started but is not yet. Wait a bit and see if messages keep not getting to the desired cell. Check the web interface if cells are there, check the log files of the domains these cells should run in or check the admin interface (System cell ps, topo cell are good points to start)
all (SRM-sligo) [] Failed to accept connection: Failed to load credentials
all [door:GFTP-xen-ep-emi-tb-se-4-<unknown>-42733@dCacheDomain:1313399408104 GFTP-xen-ep-emi-tb-se-4-<unknown>-42733 PoolDeliverFile 0000238286A68CBA49E2A34F1BEB61DAE3F8] No route to host
all (DCap-gsi-xen-ep-emi-tb-se-4) [] Exception in secure protocol : dmg.protocols.telnet.TelnetAuthenticationException?: Host Tunnel verification failed! dmg.protocols.telnet.TelnetAuthenticationException?: Host Tunnel verification failed! at dmg.protocols.telnet.TelnetStreamEngine?.<init>(TelnetStreamEngine?.java:107) ~[cells.jar:na] at ~[cells.jar:na] at$RunEngineThread?.run(LoginManager?.java:894) ~[cells.jar:na] at dmg.cells.nucleus.CellNucleus?$ [cells.jar:na] at [na:1.6.0_26] Not 100% sure but it seems that this error happens when CRLs are not up to date
all on wn lcg-cp -v -U srmv2 -T srmv2 -v file://///etc/group srm:// -b
Using grid catalog type: UNKNOWN
Using grid catalog : (null)
VO name:
Checksum type: None
Destination SE type: SRMv2
SRM_AUTHORIZATION_FAILURE] httpg:// srm:// srm:// : Permission denied
Destination SRM Request Token: (null)
Source URL: file:/etc/group
File size: 583
Source URL for copy: file:/etc/group
Destination URL: gsi
# streams: 1
GSIFTP: default set up URL mode
GSIFTP: dest: set up FTP mode. DCAU disabled. Streams = 1, Tcp BS = 0
0 bytes 0.00 KB/sec avg 0.00 KB/sec inst
583 bytes 1.24 KB/sec avg 1.24 KB/sec inst
Transfer took 1010 ms
[SE][PutDone?][SRM_FAILURE] httpg:// request contains no request token
command lcg-cp returns with: request contains no request token Turning off asynchronous mode in srm suppresses this symptom, a real solution is still under investigation. Asynchronous mode is turned off by:
all org.globus.ftp.exception.ServerException?: Server refused performing the request. Custom message: Operation failed: Best pool <sligo.desy.de_pool-path4> too high : Infinity (error code 1) no pool is available to write to This most likely happens when you have pools that are below 4GB and the gap(value that at which pool cost raises to infinity) equals 4G. Setting gap to a low value in the admin interface solves the problem.
trunk 29 Aug 2011 14:48:46 (cleaner) [] Failed to acquire connection. Sleeping for 7000ms. Attempts left: 1
org.postgresql.util.PSQLException: FATAL: connection limit exceeded for non-superusers
at org.postgresql.core.v3.ConnectionFactoryImpl?.readStartupMessages(ConnectionFactoryImpl?.java:469) ~[postgresql-9.0-801.jdbc4.jar:na]
number of allowed conenctions to postgres too low increase max_connections in /var/lib/pgsql/data/postgresql.conf, This might cause a shared memory exception.
2.1.0 using SL6 mount localhost:/ /mnt/ mount: wrong fs type, bad option, bad superblock on localhost:/, missing codepage or helper program, or other error mounting of /pnfs or /data directory does not work install nfs-utils, I had nfs3 started, mount -o vers=3,nolock localhost:/data /mnt/ solved the problem, while /etc/exports looked like this: / localhost(rw) and second line /data
2.1.0 Tue Nov 01 17:23:02 CET 2011: srmPrepareToPut submission failed, unexpected or failed status : SRM_FAILURE explanation= at Tue Nov 01 17:23:02 CET 2011 state Failed : at Tue Nov 01 17:23:02 CET 2011 state Pending : created RequestFileStatus?#-2147483646 failed with error:[ at Tue Nov 01 17:23:02 CET 2011 state Failed : ERROR: function path2inodes(character varying, character varying) does not exist; Hint: No function matches the given name and argument types. You might need to add explicit type casts.; Position: 77] There is a stored procedure missing.