wiki:GLUE/capacity
Last modified 13 years ago Last modified on 06/19/08 17:19:18

Background

Glue v1.3 defines a number of capacity properties. These are things like TotalOnlineSize, UsedOnlineSize, etc. This page attempts to define these properties

The definitions are in the form of assertions and textual descriptions. These should be sufficient to describe what can be deduced from a site's storage capacity based on the advertised values. Put simply, if you deduce something that isn't explicitly stated here then you may end up with garbage results.

When reading this page, please take into account a number of things:

  • The GLUE documentation (see GLUE-resources for links) is authoritative. In particular, any disagreement between this document and either GLUE specification or EGEE's usage of GLUE should be resolved by considering the other document's statement as definitive.
  • The document adopts the nomenclature of the GLUE/LDAP binding, so a Glue "Storage Area" is referred to as a GlueSA.
  • Some points are still (at time of writing) unclear. These are posed as open questions and are prefixed by Q/.
  • This document is a best-effort explanation, it may be incomplete or suffer from inaccuracies.
  • At various points, phrases like MAY and SHOULD appear in capital letters. These phrases should be interpreted as describe in RFC-2119.

What's available


GlueSE

The Glue "Storage Element" (GlueSE) object represents a complete storage entity, which manages disk and may (optionally) manage tape. A site will typically have a single GlueSE object.

The GlueSE object has several properties, which represent information about the storage entity as a whole. These include the following properties that describe the Storage Element's capacity:

GlueSE.SizeTotal

the size of the storage capacity managed by this service

deprecated

Assertions:

  • A site MAY advertise the SizeTotal property.
  • SizeTotal MUST be strictly greater than zero (0 < SizeTotal).
  • SizeTotal MUST be measured in GigaBytes,
  • SizeTotal MUST be larger than or equal to any other advertised capacity.
  • SizeTotal MAY include storage capacity that is not immediately available, such as storage on machines that are broken or are being drained.

A site always advertises a non-zero positive. The value may be the sum of other advertised objects (the GlueSA objects) or the site may be sufficiently complicated that this number is not a simple sum of other presented numbers.

GlueSE.SizeFree

the size of the storage capacity that is free for new files for any VO/user.

deprecated

Assertions:

  • A site MAY advertise the SizeFree property.
  • SizeFree MUST be greater than or equal to zero (0 ? SizeFree).
  • The TotalSize metric SHOULD consider all storage managed by the SE.
  • If a site advertises both SizeFree and SizeTotal then SizeFree MUST be less than or equal to SizeTotal (SizeFree ? SizeTotal).
  • SizeFree MUST be measured in GigaBytes,
  • file_size > SizeFree => upload a file of size file_size is likely to fail.

The SizeFree gives an indication of whether a site is "full", specifically if a file is to be uploaded, the SizeFree gives an indication when this will likely fail. The converse is not applicable: a value of SizeFree greater than, or approximately equal to, a file's size gives no indication that uploading that file will succeed.

GlueSE.TotalOnlineSize

Total size of online storage space

Assertions:

  • A site MAY advertise the TotalOnlineSize property.
  • TotalOnlineSize MUST be measured in GigaBytes.
  • TotalOnlineSize MUST be strictly greater than zero (0 < TotalOnlineSize).
  • The TotalOnlineSize metric SHOULD consider all storage managed by the SE that has online access latency.
  • If a site advertises both TotalOnlineSize and SizeTotal then TotalOnlineSize MUST be less than or equal to TotalSize (TotalOnlineSize ? TotalSize).

The TotalOnlineSize describes all Online storage a site's Storage Element has at its disposal. This may include storage capacity that is not explicitly mentioned elsewhere ("hidden" storage) that is utilised for caching shared between different end-user groups.

GlueSE.TotalNearlineSize

Total size of nearline storage

Assertions:

  • A site MAY advertise the TotalNearlineSize property,
  • TotalNearlineSize MUST be measured in GigaBytes.
  • TotalNearlineSize MUST be greater than or equal to zero (0 ? TotalNearlineSize).
  • The TotalNearlineSize metric SHOULD consider all storage managed by the SE that has nearline access latency.
  • If a site advertises both TotalNearlineSize and TotalSize then TotalNearlineSize MUST be less than or equal to TotalSize (TotalNearlineSize ? TotalSize).

The TotalNearlineSize describes all Nearline capacity that a site's Storage Element has at its disposal.

GlueSE.UsedOnlineSize

Used online storage

Assertions:

  • A site MAY advertise the UsedOnlineSize property,
  • UsedOnlineSize MUST be measured in GigaBytes,
  • UsedOnlineSize MUST be greater than or equal to zero (0 ? UsedOnlineSize).
  • If a site advertises both UsedOnlineSize and TotalOnlineSize then UsedOnlineSize MUST be less than or equal to TotalOnlineSize (UsedOnlineSize ? TotalOnlineSize).
  • The UsedOnlineSize metric SHOULD consider all storage managed by the SE that has online access latency.
  • The UsedOnlineSize MUST increase sometime after a file is successfully uploaded that uses some previously unused online capacity considered by UsedOnlineSize.
  • If UsedOnlineSize increases due to a file being uploaded, the increase SHOULD represent the additional online capacity usage due to that upload.
  • The UsedOnlineSize MUST decrease sometime after a file using online capacity considered by UsedOnlineSize is deleted such that additional online capacity can then be used for file uploads.
  • If the UsedOnlineSize decreases due to a file being delete, then it SHOULD decrease by the online capacity made available by that file being deleted.
  • If a file of size file_size is to be uploaded so it will use online capacity considered by UsedOnlineSize and TotalOnlineSize - UsedOnlineSize ? file_size then the upload will likely fail. Informally, this indicates that the Storage Element is "full". However, clients are encouraged to identify the most appropriate GlueSA and use the properties of that object when checking if an activity is likely to fail.

Q/ Should free space fragmentation be taken into consideration? If a file is deleted on a medium that cannot rearrange files and cannot (or chooses not to) fragment files, then the free space is no longer logical contiguous. If this is so, then deleting a file might not decrease UsedOnlineSize and uploading a file that fits within a free-space fragment might result in UsedOnlineSize not increasing. The wording above is chosen to include this possibility.

GlueSE.UsedNearlineSize

Used nearline storage

Assertions:

  • A site MAY publish UsedNearlineSize,
  • UsedNearlineSize MUST be measured in GigaBytes,
  • UsedNearlineSize MUST be greater than or equal to zero (0 ? UsedNearlineSize),
  • The UsedNearlineSize metric SHOULD consider all storage managed by the SE that has nearline access latency.
  • If a site publishes both UsedNearlineSize and TotalNearlineSize then UsedNearlineSize MUST be less than or equal to TotalNearlineSize (UsedNearlineSize ? TotalNearlineSize),
  • Sometime after a successful upload of a file such that it uses nearline capacity, the UsedNearlineSize MUST increase.

Q/ is the size of the increase in UsedNearlineSize for a successful file upload necessarily equal to the size of the file? Another way of phrasing this question is: does UsedNearlineSize include factors that might alter the size the file occupies on the storage medium, such as tape compression.

Q/ Is the value of TotalNearlineSize - UsedNearlineSize meaningful? Particular, can one conclude that, if TotalNearlineSize - UsedNearlineSize ? file_size, then attempting to upload a file of file_size will likely fail? This is related to the previous question.


GlueSA

A GlueSE may have one or more GlueSA objects placed hierarchically as immediate children of the GlueSE. These GlueSA objects describe subsets of the GlueSE's storage capacity. These subsets may be covering (that is, they may describe all available disk space) and the may be partitioning (that is, one can add together the GlueSA Total*Size metrics to derive the GlueSE.Total*Size metric). However, the presence of caching pools and any GlueSA objects the describe overlapping storage will make such calculations complicated.

The GlueSA has several properties. Those involving the GlueSA's storage capacity are:

GlueSA.State.UsedSpace

GlueSA.State.AvailableSpace

GlueSA.TotalOnlineSize

GlueSA.UsedOnlineSize

GlueSA.ReservedOnlineSize

GlueSA.FreeOnlineSize

GlueSA.TotalNearlineSize

GlueSA.UsedNearlineSize

GlueSA.ReservedNearlineSize

GlueSA.FreeNearlineSize

AccessLatency

Glue describes storage that a Storage Element provides such that it can be classified into two groups based on the access latency of the files. These groups are storage with online access latency and storage with nearline latency. There is a third access latency: offline. Although not described within Glue, it is included here for completeness.

Online

A file with online access latency is available with short latency. Assuming a system is operating under normal conditions and not suffering heavy load, a file with online access latency is made available for reading within much less than a minute.

In practice, online means the file is stored on a magnetic hard-disks. However, as new technologies are made available, other storage media may appear that can store files with online access-latency.

Nearline

A file with nearline access latency is typically available with a longer access latency than online files. Assuming a system is operating under normal conditions and not suffering heavy load, a file with nearline access latency may take longer than a minute to become available for reading.

In general, there is no strong upper limit on how long a file with nearline access-latency will take to become available for reading; however, under normal conditions, the process of making the file available should:

  • complete without human intervention, and
  • complete in much less time than a day.

Nearline access latency is typically an HSM (Hierarchical Storage Management) system: storage that uses tape and has a robotic mechanism for fetching tapes automatically, as needed, from a library.

Q/ What is the access latency of a tape drive with a single tape?

Offline

Offline storage is where files require human intervention to access. An example of offline storage is off-site backups that are stored in a fire-proof vault. Although there is not strong upper- or lower- limits on how quickly a file will be made available for reading, a request for data may take in excess of a day.


Thanks are due to the following contributers:

Stephen Burke.

Please send comments to Paul Millar.