Information Life Cycle management in EFS
Publication Date: 2014-May-23
The IP.com Prior Art Database
A mechanism of configuring file migration between multiple filesystem storage tiers, based on the heat of the content, that is applicable when the underlying storage of those tiers is thin provisioned. Specifically, the details of how to configure such a mechanism on the GPFS file system.
Page 01 of 3
Information Life Cycle management in EFS Background on GPFS & ILM
ILM in GPFS provides a mechanism for organising files within a fileset or file system onto specified GPFS storage pools. This is done by creating a set of rules that define the placement and migration of files.
ILM rules are defined with an SQL-like syntax, and allow the administrator to specify statements such as:
Which files this rule should apply to, such as:
files which match a given name pattern
files created before or after a specified time period
Files greater or smaller than a given size
Files accessed or not accessed within a specified time period
Which filesystems or filesets this rule should apply to
Whether this rule should apply to new files (for placement) or existing files (for migration) Which GPFS storage pool these files should be placed on or migrated to
A second (or third, fourth,... ) pool, with associated thresholds, to allow placement to target a different pool if the intended pool is too full
ACE integration - rules can define that specific files can pre-populate a remote panache mirror fileset
A simple ILM rule set, using file temperature
Whilst powerful, these rules can be complex to configure and maintain. A recent release of GPFS ILM provides a new temperature based migration rule, whereby GPFS tracks the heat of files, and allows migration to be described in terms of hot and cold files. This allows for a much simpler rule definition, independent of file content, that allows the construction of a very sensible default rule set, that applies well to most types of data.
One such rule set, for a two-tier system, looks like this:
One tier should be faster than the other. We shall call this the faster tier, and the other one the slower
faster tier used capacity at or around this level. This design does not guarantee this goal will be met Let us define a schedule to automatically trigger migrations when the system is quiet (eg 1am every night). This makes it less likely that migrations will occur during periods of heavy load, which can negatively impact the performance of the system
Place all new files on the faster tier When the trigger capacity is met or exceeded, or on the defined schedule, perform a migration:
Use the GPFS ILM temperature tracking to understand which files are hot and which are cold Migrate hot files onto the fast tier, and cold files to the slower tier, such that the used capacity on the faster tier is no more than the goal capacity (again - this is not strongly guaranteed. If, for example, a large amount of new file IO occurred whilst the migration is in progress, this goal will not be met)
This rule set should tend to mean that frequently accessed files will remain on the faster pool, and so have short access latencies, whilst less frequently used files will remain on the cheaper, slower tier.
It is possible to implement such a rule set quite straightforwardly using existing GPFS technology
Difficulties with space effi...