Minimizing the costly update of persistent meta-data in multi-tier environment by using metadata filter
Publication Date: 2015-Nov-24
The IP.com Prior Art Database
Avoiding unnecessary update of persistent metadata helps in keeping storred data more robust and minimizes the performance impact associated with frequent update of the same. We claim a Meta data update filter were the Meta data filter identifies updates to the persistent non volatile memory base on a rule set not to be updated as frequent as of today.
Page 01 of 8
Minimizing the costly update of persistent meta - using metadata filter
Multi-tier data management systems often expend considerable effort in maintaining the metadata in an up-to-date and correct condition. One of the most common reasons for moving data among the various multi-tier components is the frequency by which the respective user data is accessed. In order to recognize the usage pattern, the system uses meta data to track the IO activity and profile. Metadata failures can occur from time to time due to a variety of factors, including loss or corruption of the stored metadata, failures in the circuitry used to access the metadata, incomplete updates of the metadata during a power interruption, etc. In some cases, a metadata failure may render portions of, or the entire device, incapable of correctly returning previously stored data.
In some storage systems, certain types of metadata relating to the state of the system may be updated on a highly frequent basis. For example, various counters/time-stamps, indicative of the most recent access to users' files and data blocks, may be incremented during each read/write operation. In high performance environments, this may result in several tens of thousands, or hundreds of thousands (or more) of [meta data] state changes per second. Other types of metadata are relatively stable and do not change frequently, such as logical addresses, forward pointers, reverse directories, etc.
One significant differentiator among the assortment of multi-tier solution is the data granularity. Granularity means efficiency and efficiency usually means better savings: you can find very different chunk sizes ranging between 512KB to 1GB! (Some vendors are already talking about going down to 32KB blocks). So, granularity is important because: The more granularity you have, the less data you move in the backend; Small chunks can be moved up and down relatively often for a better data placement and fast/finest tuning; The risks with big chunks are: Before moving a big amount of data, the algorithm needs to wait too much with the risk of moving the data when it's too late! If you have a small amount of active data, like a bunch of megabytes in a large LUN, the risk is to have GBs and GBs of data moved to the upper, costlier, tier. However, the higher the granularity is the more meta data has to be kept updated. Thus, it is important as much as possible to minimize the need to update the persistent meta-data in order to avoid the overhead involved and, exposure to meta-data loss or corruption. The more stable/unchanged the meta data is, the less it is exposed to meta data corruption and the less negative impact it has on performance.
Fig. 1 outlines the Prior Art solution of Meta Data management. The metadata is stored in the volatile very fast RAM storage, fast non volatile Flash or slower non volatile Hard Disk Drive (HDD) storage. The destage controller manages the optimum placement of the...