wiki:How-tos/GetAListOfAllFilesOnAPool
Last modified 11 years ago Last modified on 03/02/10 11:30:05

How-To Get A List Of All Files On A Pool

The following describes how to create a list of all files (either with their pathnames or their PNFS-IDs) on a given pool.

Requirements, Constraints And Conventions

  • The usage of Chimera as file-hierarchy-provider and PostgreSQL as DBMS for the chimera-database is expected (it should be however easy to adapt to another DBMS).
  • Several POSIX-utilities are required.
  • Space management is not covered with respect to the described topics (for example questions like “Which files exist on a pool as supposed by the space management system?”).
  • Files are not checked for their integrity here, which means if a file is found to “exist” it does not mean that the file itself is correct.
  • Lines starting with “$” are entered within a POSIX-sh-compatible shell.
  • Lines starting with “#” are entered within a POSIX-sh-compatible shell, with the effective user-ID and group-ID being 0 (“root-rights”).
  • Lines starting with “(location) >” are entered within dCache’s administration interface with location as the current location.
  • Standard input is written bold, standard output normal and standard error underlined.

Introduction

When talking about “files on a pool” one has to differ between:

  • Files which Chimera (and thus dCache) thinks should be on a pool.
  • Files which are actually on a pool.

The two sets can be different (inconsistent) in several cases:

  • In case of errors within dCache itself.
  • When files got deleted in the file-hierarchy (for example Chimera) but for some reason not on the pools.
  • When files got deleted or were somehow lost (for example in case of problems with the underlying file-system or the hardware) on the pools. In such cases, the file-hierarchy does of course not automatically remove those files.

Pleas note the following:

  • The terms “pool” and “pool node” means to different thing. A pool (for example “pool1_1”) is an abstract component within a dCache cluster, whereas a pool node (for example “pool1”) is a host that contains one or more pools.
  • In the following chapters, dCache’s databases will be directly accessed, so take care!

Creating A List Of Files Which Chimera (And Thus dCache) Thinks Should Be On A Pool

This information can be easily queried from the chimera-database via the following SQL-command:
SELECT ipnfsid FROM t_locationinfo WHERE ilocation='pool-name';
where pool-name has to be replaced with the name of the desired pool.

In order to create a file containing such a list one could use the PostgreSQL-client “psql” for example like this:
# psql --no-align --tuples-only --username postgres --dbname chimera --command "SELECT ipnfsid FROM t_locationinfo WHERE ilocation='pool_1';" --output list-of-files-that-should-be-on-pool_1-according-to-Chimera


Instead of querying the PNFS-IDs of the files, one might also need to query their pathnames. This is achieved by using the “inode2path”-function in the above SQL-command, which reads then like this:
SELECT inode2path(ipnfsid) FROM t_locationinfo WHERE ilocation='pool-name';

Creating A List Of Files That Are Actually On A Pool

In principle it is possible to get such a list using the “rep ls”-command to a pool-cell within dCache’s administration interface, for example like this:
(pool1_1) > rep ls
000069E2E50115DE403F9CD0823FBB02BBDA <C-------X--L(0)[0]> 1527553729 si={atlas:DataDisk}
00004E2448EAF268470DB6644E4B2CE0D1C1 <C-------X--L(0)[0]> 1536444009 si={atlas:ScratchDisk}
0000203FC32662F640539BDC49F02B7FAA53 <C-------X--L(0)[0]> 1524450658 si={ops:STATIC}
000200000000000005302750 <C-------X--L(0)[0]> 1420357071 si={atlas:MCDisk}
0002000000000000040267A0 <C-------X--L(0)[0]> 35228007 si={dgrid:STATIC}
(This could be easily “cleaned up” vi an sed-command like “'s/ .*$//'”.)

The problem however with rep ls is that its information might be already out of sync. Apart from that a running dCache is required on the specific pool, which might not be available in case of problems.


Another (more manual but therefore more secure) way is to directly list the actually available files using the underlying file-system (for example btrfs, ext4 or XFS).
This could be done using the “print_actual_pool_files.sh”-program attached to this how-to, for example:
# print_actual_pool_files.sh pool1_1
which prints all the file names of a pool’s “/data”-directory to standard output. As each file is named after its PNFS-ID, a list of PNFS-IDs is actually returned.
The script must be invoked on the respective pool node.


In order to convert this list of PNFS-IDs into the pathnames used in the file-hierarchy one should utilise the chimera-database again as described in “Translating Large File-Lists”.
For example:
# print_actual_pool_files.sh pool1_1 > list-of-PNFS-IDs-that-are-actually-on-pool_1
# sed -i "s/(.*)/VALUES (inode2path('1'));/" list-of-PNFS-IDs-that-are-actually-on-pool_1
# psql --no-align --tuples-only --username postgres --dbname chimera --file list-of-PNFS-IDs-that-are-actually-on-pool_1 --output list-of-pathnames-that-are-actually-on-pool_1

License Of This Document

Copyright © 2009–2010, Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>.
All rights reserved.


This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Germany License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/de/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

Attachments