wiki:How-tos/FindFileDiscrepanciesBetweenChimeraAndASinglePool
Last modified 11 years ago Last modified on 10/29/09 13:57:56

How-To Find File Discrepancies Between Chimera And A Single Pool

The following describes how to find file discrepancies between Chimera and a single pool.

Requirements, Constraints And Conventions

  • The usage of Chimera as file-hierarchy-provider and PostgreSQL as DBMS for the chimera-database is expected (it should be however easy to adapt to another DBMS).
  • Several POSIX-utilities are required.
  • Discrepancies with the space management databases are not covered.
  • Files are not checked for their integrity here, which means if a file is found to “exist” it does not mean that the file itself is correct.
  • Only discrepancies between Chimera and a single pool are determined. Thus its usually not directly possible to use this information to find files which exist in Chimera but not on any pool.
  • Lines starting with “$” are entered within a POSIX-sh-compatible shell.
  • Lines starting with “#” are entered within a POSIX-sh-compatible shell, with the effective user-ID and group-ID being 0 (“root-rights”).
  • Lines starting with “(location) >” are entered within dCache’s administration interface with location as the current location.
  • Standard input is written bold, standard output normal and standard error underlined.

Introduction

A file or information on it can “appear” in several different places within dCache:

  • The file-hierarchy-provider Chimera, which stores all kinds of meta-data including its pathname?PNFS-ID mapping, its file properties and the locations of all its replicas.
  • One or more pools, each holding a replica (the actual data of the file) and some meta-data of it.
  • The space management system, which keeps track on which pathname belongs to which space reservation.

For many reasons it can happen, that discrepancies between those arise. Typical examples include:

  • Partial (single files) or complete (all files) loss of a pool due to human errors, software problems (for example a damaged file-system) or damaged hardware. (? This usually means, that a file is “in” Chimera, but not on a given pool anymore.)
  • Other strange problems or bugs that lead to files, which are appear either in Chimera or on a given pool.

Note: Discrepancies with the space management system are not covered here.


It is often helpful to create a list of such discrepancies, for example in order to finally delete those files.

Note: A list of discrepancies between Chimera and single pool alone might not be useful, especially when files have multiple replicas (in that case, a discrepancy might very well exist between Chimera and a single pool, still it might be a bad idea to delete such a file, as replicas on other pools would be deleted, too).

Creating “Delta-Lists” Of Files Between Chimera And A Single Pool

  1. Create a list of files that should be on the respective pool according to Chimera, by following the instructions described in “Creating A List Of Files Which Chimera (And Thus dCache) Thinks Should Be On A Pool”.
  2. Create a list of files that are actually on the respective pool, by following the instructions described in “Creating A List Of Files That Are Actually On A Pool”.
  3. Sort the lists via:
    # sort list-of-files-that-should-be-on-the-pool-according-to-Chimera > sorted-list-of-files-that-should-be-on-the-pool-according-to-Chimera
    # sort list-of-files-that-are-actually-on-the-pool > sorted-list-of-files-that-are-actually-on-the-pool
  4. Create the delta-lists via:
    # diff sorted-list-of-files-that-should-be-on-the-pool-according-to-Chimera sorted-list-of-files-that-are-actually-on-the-pool > deltas
    (Note: The order of the arguments to diff is improtant.)
    # grep '^< ' deltas | sed 's/^< //' > files-that-should-be-on-the-pool-according-to-Chimera-but-that-are-not
    # grep '^> ' deltas | sed 's/^> //' > files-that-are-actually-on-the-pool-but-not-according-to-Chimera

The lists created in step 1 and 2 must be created both either with pathnames or PNFS-IDs.
Depending on the choosen type, the delta-lists will also use either pathnames or PNFS-IDs.

License Of This Document

Copyright © 2009, Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>.
All rights reserved.


This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Germany License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/de/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".