wiki:manuals/poolIncompleteFiles
Last modified 12 years ago Last modified on 12/15/08 14:46:25

Pool behaviour with incomplete files

Release 1.9.1-3 changes what happens on pools when uploads, pool to pool transfers, or stage operations fail. In previous releases, including 1.8, the behaviour was inconsistent and difficult to predict. With the changes introduced in 1.9.1-3 we hope to make the behaviour easier to understand.

Notice that the following description only concerns pools and what happens to the copy of the file stored on the pool. Semantics implemented by various doors are not discussed, nor does the procedure described below have any effect on the name space entry. Below the term replica is used to refer to the instance of the file stored on a pool; use of this word does not imply that more than a single copy exists.

Pool to Pool Transfers

Summary

In case of transfer failures, partially copied files are removed.

Detailed description

  1. If the target replica already exists, then the transfer is rejected and the existing replica is left in place.
  2. Failure to complete a pool to pool transfer will cause the replica on the target pool to be deleted.
  3. If the pool is shut down before the transfer is completed, then the pool will delete the replica on startup.

Stage from Tape

Summary

In case of staging failures, partially staged files are removed.

Detailed description

  1. If the target replica already exists, then the stage is rejected and the existing replica is left in place.
  2. Failure to complete a stage from tape will cause the replica on the stage pool to be deleted.
  3. If the pool is shut down before the stage is completed, then the pool will delete the replica on startup.

Upload from Client

Summary

If a checksum or file length was provided by the client, then a mismatch will cause the file to be marked broken. Otherwise the resulting file is considered intact.

Detailed description

  1. The pool does not distinguish between various types of transfer interruptions. Whether a transfer was interrupted because the client crashed, because the network connection was lost, because the mover was killed or because the pool node crashed is not important. The pool reacts in the same way in any of these cases.
  2. If the client specified either checksum, file size, or some other verifiable property about the file prior to upload (regardless of the mechanism used to specify such properties), then the pool verifies those after transfer. In case of mismatches, the file is flagged as BAD/BROKEN.
  3. A BAD/BROKEN replica does not flush to tape and cannot be read by clients or transferred to other pools. They are deleted if the name space entry is removed. Should the user wish to recover the broken replica, then the admin has to manually change the state of the replica.
  4. If the client did not specify a verifiable property about the file prior to upload, then the file is considered complete no matter how the transfer ended. Whatever made it to the pool is considered to be the complete file. The checksum is computed and stored in PNFS together with the file size. CUSTODIAL+NEARLINE files go to tape. This behaviour is consistent with local file system semantics: If a new file is opened and the program crashes, then the file still exists. We emulate this behaviour in dCache. Notice that the error is still reported back to the door, which will inform the client (if possible).
  5. If dCache at a later point discovers that a replica has become corrupted (the checksum or length no longer matches the checksum or length stored in PNFS), then the replica is marked BAD/BROKEN.
  6. During pool startup, the pool will recognise partially uploaded replicas, and will treat the restart like any other transfer failure. This implies that pool startup may invoke a lengthy checksum calculation.

Some further comments

There are several observations to be made from the above points:

  1. Zero length replicas are not special. To honour the local file system semantics described in item 4, then a zero length replicas is no different from a non-empty replica: If a client creates a file and crashes before it wrote data into it, then the file will still exist.
  2. The semantics does not say anything about how the door behaves. A door is free to delete the name space entry after a failure (the FTP door does this by default).
  3. From the point of the view of the client, there is no difference between a mover getting killed while it is still queued, and a mover getting killed just after it was started, but before a connection to the client was established. Thus the result should be the same. The logical conclusion is that killing a mover while it is still queued will result in a zero length file being created on the pool. If a checksum/length was specified, then the file will be marked BAD/BROKEN.
  4. If a mover is administratively killed using the -force option, then the semantics is not guaranteed.
  5. Broken replicas are unreadable, but currently the pool manager, doors and SRM do not understand this state. A transfer attempt will hang. In a later release we will update the pool manager and the various doors to recognize broken files and provide appropriate responses to users about the situation.