Browse Prior Art Database

Reducing Data Movement in Log Structure File System Garbage Collection

IP.com Disclosure Number: IPCOM000114901D
Original Publication Date: 1995-Feb-01
Included in the Prior Art Database: 2005-Mar-30
Document File: 4 page(s) / 117K

Publishing Venue

IBM

Related People

Cohn, O: AUTHOR [+4]

Abstract

Novel Advance: A technique is disclosed for garbage collecting live data in a segmented Log Structure File System (LSFS) such that during the garbage collection process only part of the live data is moved. When the distribution of live data across the disk is uniform or unknown, this technique yields significantly less data movements without decreasing the rate in which segments are emptied.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 51% of the total text.

Reducing Data Movement in Log Structure File System Garbage Collection

      Novel Advance: A technique is disclosed for garbage collecting
live data in a segmented Log Structure File System (LSFS) such that
during the garbage collection process only part of the live data is
moved.  When the distribution of live data across the disk is uniform
or unknown, this technique yields significantly less data movements
without decreasing the rate in which segments are emptied.

      Technical Description:  LSFS is a new approach for disk storage
management.  Its main idea is to serve all modifications to disk
(write operations) sequentially in a log-like structure.  An
essential part of any LSFS is a process that collects live data and
thereby frees enough sequential space for new data.  This process is
called the Garbage Collection process.

      When the distribution of live data across the disk is uniform
(e.g., most of the data have a long life span) or unknown, existing
Garbage Collection methods require the movement of a large amount of
data over a given amount of time.  Thus, the garbage collection
process consumes system resources heavily and thereby degrades the
performance of the LSFS.

      New Garbage Collection Approach - The disclosed technique moves
all live data from segments with a relatively low percentage of live
data (hereafter - source segments) to fill up segments with
relatively high percentage of live data (hereafter - target
segments).  Thus, during the garbage collection process, only part of
the live data is moved.  Although this technique always reduces the
amount of data moved, the reduction is particularly significant in
these cases where the distribution of live data across the disk is
uniform or unknown.  The implementation of this technique requires
the following Control Data Structures:
  Empty
     An ordered list of empty segments.
  Non_empty
    An ordered list of Non_empty segments.
  Live_percentage
    An array stating for each segment the amount of live data it
    contains.
  Fragmentation_array
    An array stating for each segment the amount of fragmentation it
     contains.  See Fragmentation_threshold below for more details.

The implementation of this technique requires the following
constants:
  Full_threshold
     The amount of data beyond which a segment is considered full,
     and thus does not participate in the garbage collection process.
  Source_threshold
     The amount of data below which a segment is considered a source
     segment.  Segments with more then Source_threshold but less than
     Full_threshold of data are designated target segments.
  Fragmentation_threshold
     A number below which a segment is considered fragmented.  This
     number must reflect the average size of the holes and their
     distribution as well as the amount of live data in the segment.

      Write Process: The w...