wiki:RALBugs
Last modified 5 years ago Last modified on 01/03/13 14:25:21

This page contains a list of problems identified during a visit to RAL.

  1. dCache all-in-one configuration has BoneCP attempting to create too many db connections compared to PostgreSQL's default configuration: dCache is broken out-of-the-box and the install instructions just don't work.
  2. Fix PostgreSQL chapter: it is confusing whether commands need to be run when installing from the RPM. User ended up running the 'initdb command' (is this needed?). When this failed they tried to 'chown dcache' the postgres data directory.
  3. Update book and the defaults files to tell admins not to edit defaults file; somehow this isn't understood.
  4. Update book and defaults file to tell users not to copy the annotation when configuring dCache. At least dCache behaves correctly.
  5. RB Change the default namespace to remove confusing 'admin' 'etc' directories. Comment from admin: "oh I wanted to expose my data but dCache is exposing the configuration; this is wrong."
  6. The chimera-cli needs a man page and better help command. As is, the command provides almost no help in understanding how to use it. Also, the 'help' command only works if the command-line also has a path. Accept '-help' and '--help' as synonyms for help.
  7. There is no mechanism to delete files (an 'rm' command) with the chimera-cli.
  8. There's no "rm -rf" like operation in chimera-cli to clear a subdirectory even if it has contents.
  9. When starting the NFS server there is no warning that exports file is empty. If the user has configured NFS to run but isn't exporting anything then the user is likely confused.
  10. Mounting the root partition is broken. The files and directories have only '?' for permissions and ownership. This when the root partition is mentioned in /etc/exports file.
  11. RB Default Chimera namespace is broken: directories cannot be deleted as the link-count is wrong.
  12. Add a getting-in-touch section that describes how to move a pool off from the single-node instance onto a dedicated node. Suggested procedure:
    • describe how to configure a dedicated pool node
    • describe how to add a node into the dCache cluster
    • describe how to check if the new pool is available
    • describe how to decommission the old pool, moving data from old pool to the new pool; how to remove the old pool.
  13. Somehow get rid of the "available pool size has changed" message if the reduction in available space is due to activity in another pool: one pool should not behave so it triggers a warning in another component.
  14. Add a getting-in-touch section that describes how to move doors from the head/single node to a dedicated node.
  15. Add a getting-in-touch section that describes how to update dCache configuration so that files written into a directory go to one poolgroup while files written into another directory go to another poolgroup:
    • pool defaults to 'default' pgroup: how to change this?
    • where is the pool-manager configuration file?
    • how to create the configuration file, so it may be edited?
    • in what way should the configuration file be altered?
    • what is a directory tag?
    • what tags need to be set?
    • what should their values be?
    • how to go about setting tags?
  16. Add support for globs in the 'psu' command; for example, 'psu addto pgroup'. This allows many pools to be added in one go.
  17. FIXED Show the domain's uptime in "dcache status" command. This helps indicate when a domain is restarting.
  18. Having an unknown host in 'broker.host' causes a stack-trace in the log file.
  19. Allow the client to log-out from any cell.
  20. The 'chimera-cli lstag' command throws exception (visable on console) if path does not exist.
  21. The help for 'psu create unit' command is misleading as it misses the -proto option.
  22. Change the 'psu add link' command to 'psu addto link'. This is more in keeping with the other commands.
  23. The tab completion on psu commands is broken and produces incorrect output.
  24. Add a 'psu' command to allow renaming of pgroup, ugroup, link, etc.
  25. The output from the 'psu ls -a link' command doesn't clearly distinguish between pools and poolgroups: it's only if you happen to know how a pool's info looks that you would realise when a pool was added by mistake.
  26. Require pools and poolgroups to have distinct names. Fail to create a pool or poolgroup if the other already exists.
  27. The "psux ls pgroup" command triggers an exception (in domain log file) if argument is a non-existing poolgroup.
  28. If a directory has a OSMTemplate tag but no sGroup and no HSMType tag then writing into this directory kills the PoolManager's RequestHandler thread. This kills dCache.
  29. Mounting an NFS v4.1 door as 'localhost' stops pNFS from working. When copying a file into dCache, the door successfully creates a namespace entry, the pool receives the message to accept data, an empty file exists in the pool (with the correct PNFS-ID), but no data is every transferred.
    It is unclear whether or not the client connects to the pool.
    Remounting using the FQDN "fixes" the problem.
  30. FIXED Fix chimera-cli's 'ls' command so that the output is the same as 'ls -l'
  31. Fix chimera-cli so commands more closely follow the standard POSIX filesystem commands.
  32. RB Fix chimera-cli 'chown' command so the group-ownership may be altered.
  33. Mounting as NFSv4 and running the 'ls -l' shows all file and directory ownership as nobody:nobody. There's no diagnostics why this is happening and (seemingly) no way of investigating other than wireshark.
  34. Installing 2.2 RPM for the first time results in this error message:
    chown: cannot access `/var/lib/dcache/plots': No such file or directory
  35. Fix SQL injection parts of installation so that harmless notifications are not shown.
  36. dCache downloads page redirects to a dCache pool, which uses a non-standard HTTP port. When trying to download an RPM from RAL hostel, this TCP connection was blocked.
  37. In the 2.2 Book, the install chapter, "Creating users and databases for dCache" instructions for creating user 'srmdcache' is wrong.
  38. FIXED The single.conf example layout includes 'broker.scheme=none' for no good reason: it doesn't fix anything and makes moving to a multi-domain setup harder. Remove it.
  39. Book introduction is missing an overview picture, showing how all the services fit together. The sysadmin cannot know which component does what otherwise.
  40. Book is missing any guidelines on how to deploy services: how many head nodes, how many doors, how many pools. Without any guidelines, an admin has simply no idea.
  41. REVIEW Currently dCache cannot register with the system rpcbind using default configuration. Either this is fixed or the Book is updated to say that rpcbind must be in insecure mode, probably best to give explicit instructions how to achieve this: create the file /etc/sysconfig/rpcbind with the single line:

    RPCBIND_ARGS=-i

    [Note: it turns out this option is defined in the book; so why did the admin not find it?]
  42. When dCache shuts down, especially if it is due to a cell dying during initialisation there is a lot of noise. The noise hides the actual problem: the admin does not know to ignore these messages, so is confused and cannot identify the problem. This noise should be removed.
  43. Likewise, there is a lot of noise when dCache starts up that is of no interest. This, too, should be removed as it makes diagnosing any problem during startup harder.
  44. The Book mentions the "ro" option in the exports file, saying this is to support read-only exports. However, dCache doesn't honour this. The documentation should be updated and dCache should issue a warning (or better, refuse to start) if the read-only flag is configured in /etc/exports.
  45. RB Several obsolete annotated properties do not have error messages as their values but the old default values. This need to be updated.
  46. SRM client output is terrible (help is a stack-trace, too verbose, etc)
  47. srmLs is using SRM v1 endpoint instead of SRM v2 endpoint. Specifying -webservice_path=srm/managerv2 option forces it to use SRM v2 endpoint, so the 'ls' command can succeed. This was with srmclient v1.9.5-3
  48. gridftpcopy "main" java.lang.NoClassDefFoundFile for bouncy castle. Looks like BouncyCastle is missing from ClassPath.
  49. gridftpcopy cmd line is stupid.
  50. gridftpcopy protocol file: requires an extra slash as file:///etc/group maps to etc/group
  51. Malformed gsiftp URLs are not rejects but empty hostname is used; e.g. gsiftp:////ftp.example.org:2281/path/to/something
  52. Output from gridftpcopy command is confusing: too much information that is unnecessary.
  53. grep command in admin interface for the pinboard?
  54. 'pool disable' message is not logged in PoolManager pinboard
  55. Cannot specify a message with 'pool enable' command.