wiki:How-tos/ConvertPoolMetadata
Last modified 10 years ago Last modified on 03/05/11 00:14:43

How-To Convert Pool Metadata

The following describes how to convert the pool metadata between the different repository formats.

Requirements, Constraints And Conventions

  • Lines starting with “$” are entered within a POSIX-sh-compatible shell.
  • Lines starting with “#” are entered within a POSIX-sh-compatible shell, with the effective user-ID and group-ID being 0 (“root-rights”).
  • Lines starting with “(location) >” are entered within dCache’s administration interface with location as the current location.
  • Standard input is written bold, standard output normal and standard error underlined.

Introduction

The pools within a dCache cluster contain some metadata for each of the files which is stored on the respective pool.
Currently there are the following repository “formats” for these metadata:

  • File-based Metadata Repository
  • Berkeley-DB-based Metadata Repository

The default format is the file-based metadata repository.

File-based Metadata Repository

This is the default metadata repository format, in which the metadata for a given file on the respective pool is stored in two files:

  • pool-directory/control/000069E2E50115DE403F9CD0823FBB02BBDA
  • pool-directory/control/SI-000069E2E50115DE403F9CD0823FBB02BBDA

Berkeley-DB-based Metadata Repository

With this format, the metadata is stored in a Berkeley DB database in the directory pool-directory/meta.

Conversion

There are several reasons why it might be feasible to change from the one format to the other.

For example, the start-up phase of the pools can be dramatically decreased by using Berkeley-DB-based metadata repositories.
To give a rough estimation: On a given test-system, a pool with about 300000 files took about 120 minutes to start-up when using the file-based metadata repository. After the conversion to the Berkeley-DB-based metadata repository it only took about 2 minutes.

The conversion consists typically of the following steps:

  1. Configure the desired metadata repository format. The used metadata repository format is set via the “metaDataRepository”-option within dCache’s configuration file “dcache.conf”, using the following values:
    • org.dcache.pool.repository.meta.file.FileMetaDataRepository
      File-based metadata repository format. This is the default.
    • org.dcache.pool.repository.meta.db.BerkeleyDBMetaDataRepository
      Berkeley-DB-based metadata repository format.
  2. Configure the source of the (current) metadata used during conversion. The source which from which the current metadata is read during conversion is set via the “metaDataRepositoryImport”-option within dCache’s configuration file “dcache.conf”, using the following values:
    • org.dcache.pool.repository.meta.EmptyMetaDataStore
      The metadata is obtained from the PnfsManager-cell, which is typically very slow. This is the default.
    • org.dcache.pool.repository.meta.file.FileMetaDataRepository
      The metadata is tried to be obtained from and existing file-based metadata repository. Using this only makes sense when converting to the Berkeley-DB-based metadata repository format.
    • org.dcache.pool.repository.meta.db.BerkeleyDBMetaDataRepository
      The metadata is tried to be obtained from and existing Berkeley-DB-based metadata repository. Using this only makes sense when converting to the file-based metadata repository format.
  3. (Re-)start the respective pool. In the log file you will now see messages like these:
    28 Feb 2011 20:37:34 (pool-name) [] NOTICE: Importing any missing meta data from /pool-directory/control. This should only be used to convert an existing repository and never as a permanent setup.
    28 Feb 2011 20:37:34 (pool-name) [] Pool mode changed to disabled(fetch,store,stage,p2p-client,p2p-server)
    Feb 28, 2011 8:37:34 PM com.sun.grizzly.Controller start
    INFO: Starting Grizzly Framework 1.9.19-SNAPSHOT - Mon Feb 28 20:37:34 CET 2011
    28 Feb 2011 20:37:34 (pool-name) [] Reading inventory from [data=/pool-directory/data;meta=/pool-directory/meta]
    28 Feb 2011 20:37:35 (pool-name) [] Imported meta data for 000000A46ACA6E2B4F35B52A25ABEF33977C from /pool-directory/control
    ... 28 Feb 2011 20:37:56 (pool-name) [] Imported meta data for 0000FFE59834EA5140EDB34AEFD2CBCE24B1 from /pool-directory/control
    28 Feb 2011 20:37:57 (pool-name) [] Pool mode changed to enabled

    Here, the format was changed from a file-based metadata repository to Berkeley-DB-based metadata repository.

    Note, that the conversion is not finished until the pool is enabled again. This is for example indicated by the “Pool mode changed to enabled” log message.
  4. Unset the “metaDataRepositoryImport”-option within dCache’s configuration file “dcache.conf”. When the conversion has finished, this option should be unset (so that its default becomes active again). This is strongly recommended to avoid future mistakes and to prevent dCache from trying to import any missing metadata, which should not be necessary any longer.
  5. Delete the directory where the “old” metadata repository is stored.

License Of This Document

Copyright © 2011, Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>.
All rights reserved.


This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Germany License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/de/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".