wiki:TroubleShooting

Version 33 (modified by bernardt, 9 years ago) (diff)

--

Trouble Shooting Guide for dCache

dCache version Error Problem Solution
all ERROR: role "chimera" already exists When setting up the postgres database and something goes wrong using the commands described in the installation documentation. Reissuing commands such as 'createuser -U postgres --no-superuser --no-createrole --createdb --pwprompt chimera' will result in 'createuser: creation of new role failed: ERROR: role "chimera" already exists' dropdb -U postgres chimera && su postgres && dropuser chimera, afterwards you can start over creating the database and user, etc.
>1.9.12 In the log files you will find: (666) file:/usr/share/dcache/services/pool.batch: line 100: (3) java.util.concurrent.ExecutionException?: org.springframework.beans.factory.BeanCreationException?: Error creating bean with name 'lock' defined in class path resource [org/dcache/pool/classic/pool.xml]: Invocation of init method failed; nested exception is java.io.FileNotFoundException?: /pools/pool1/lock (Permission denied) from ac_create_$_2_3 The rights for the pool are not properly set. Changing ownership of pools to dcache user solves the problem. (chown dcache /pools/pool1/ in this case)
all, at occurrence 1.9.12 26 Apr 2011 11:40:36 (SRM-xen-ep-emi-tb-se-3) [] org.globus.common.ChainedIOException: Authentication failed [Caused by: Failure unspecified at GSS-API level [Caused by: Bad certificate (The signature of 'C=CH,O=CERN,OU=GD,CN=Test user 100' certificate does not match its issuer) Bad certificate (The signature of 'C=CH,O=CERN,OU=GD,CN=Test user 100' certificate does not match its issuer) The problem was solved by putting the issuing CA's certificates in /etc/grid-security/certificates
all, especially 1.9.5 SRMClientV2 : srmPrepareToPut: try # 0 failed with error SRMClientV2 : ; nested exception is: java.io.EOFException Tue Apr 26 14:43:40 CEST 2011: java.io.EOFException Tue Apr 26 14:43:40 CEST 2011: stopping copier going to stop.... srmcp causes a EOF exception. The actual problem can lie in a CA's certificate revocation list (crl) being out of date. check if crl of the CA that issued the user certificate is out of date. Find out the issueing CA: openssl x509 -in ~/.globus/<usercert>.pem -noout -issuer, then grep for the relevant crl on the storage element (SE). This will give you a name: something like: 1cda0759.r0 This command can then be used to check if the crl is still valid: openssl crl -in 1cda0759.r0 -noout -text
at occurence 1.9.12 11 May 2011 12:28:56 (alm) [] Auth (knownUsers) : Ssh knownUsers unavailable for request from User admin Host 131.169.252.35 SshStreamEngine? : SSH_CMSG_AUTH_RSA : Key not found
all $ ssh -c blowfish -p 22223 -l admin <host> --> respond_to_rsa_challenge: server_key 768 < host_key 768 + SSH_KEY_BITS_RESERVED 128 generate new host_key and server_key and cleanup your known_hosts file, then retry after admindoor restart
all in /var/log/dCacheDomain.log: (poolCollector) [] Failed to update topology map: {uoid=<1313051194606:125>;path=[>PoolManager@local];msg=Missing routing entry for PoolManager@local}
all (SRM-sligo) [] Failed to accept connection: java.io.IOException: Failed to load credentials
all [door:GFTP-xen-ep-emi-tb-se-4-<unknown>-42733@dCacheDomain:1313399408104 GFTP-xen-ep-emi-tb-se-4-<unknown>-42733 PoolDeliverFile 0000238286A68CBA49E2A34F1BEB61DAE3F8] java.net.NoRouteToHostException?: No route to host
all (DCap-gsi-xen-ep-emi-tb-se-4) [] Exception in secure protocol : dmg.protocols.telnet.TelnetAuthenticationException?: Host 131.169.4.199: Tunnel verification failed! dmg.protocols.telnet.TelnetAuthenticationException?: Host 131.169.4.199: Tunnel verification failed! at dmg.protocols.telnet.TelnetStreamEngine?.<init>(TelnetStreamEngine?.java:107) ~[cells.jar:na] at dmg.cells.services.login.StreamEngineFactory?.newStreamEngine(StreamEngineFactory?.java:42) ~[cells.jar:na] at dmg.cells.services.login.LoginManager?$RunEngineThread?.run(LoginManager?.java:894) ~[cells.jar:na] at dmg.cells.nucleus.CellNucleus?$1.run(CellNucleus?.java:654) [cells.jar:na] at java.lang.Thread.run(Thread.java:662) [na:1.6.0_26]
all on wn lcg-cp -v -U srmv2 -T srmv2 -v file://///etc/group srm://xen-ep-emi-tb-se-4.desy.de:8443/srm/managerv2?SFN=/pnfs/desy.de/data/testers.eu-emi.eu/TestFile000011 -b
Using grid catalog type: UNKNOWN
Using grid catalog : (null)
VO name: testers.eu-emi.eu
Checksum type: None
Destination SE type: SRMv2
[SE][Mkdir]
SRM_AUTHORIZATION_FAILURE] httpg://xen-ep-emi-tb-se-4.desy.de:8443/srm/managerv2: srm://xen-ep-emi-tb-se-4.desy.de:8443/srm/managerv2?SFN=/pnfs/desy.de/data/testers.eu-emi.eu: srm://xen-ep-emi-tb-se-4.desy.de:8443/srm/managerv2?SFN=/pnfs/desy.de/data/testers.eu-emi.eu : Permission denied
Destination SRM Request Token: (null)
Source URL: file:/etc/group
File size: 583
Source URL for copy: file:/etc/group
Destination URL: gsiftp://xen-ep-emi-tb-se-4.desy.de:2811/pnfs/desy.de/data/testers.eu-emi.eu/TestFile000011
# streams: 1
GSIFTP: default set up URL mode
GSIFTP: dest: set up FTP mode. DCAU disabled. Streams = 1, Tcp BS = 0
0 bytes 0.00 KB/sec avg 0.00 KB/sec inst
583 bytes 1.24 KB/sec avg 1.24 KB/sec inst
Transfer took 1010 ms
[SE][PutDone?][SRM_FAILURE] httpg://xen-ep-emi-tb-se-4.desy.de:8443/srm/managerv2: request contains no request token
command lcg-cp returns with: request contains no request token Turning off asynchronous mode in srm suppresses this symptom, a real solution is still under investigation. Asynchronous mode is turned off by:
overwriteEnabled=true
srmGetReqSwitchToAsynchronousModeDelay=0
srmPutReqSwitchToAsynchronousModeDelay=0
srmLsRequestSwitchToAsynchronousModeDelay=infinity