Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Using the Overhead Costs of Network Messages to Choose Between Cache Coherency Strategies in a Distributed File System

IP.com Disclosure Number: IPCOM000121299D
Original Publication Date: 1991-Aug-01
Included in the Prior Art Database: 2005-Apr-03
Document File: 3 page(s) / 113K

Publishing Venue

IBM

Related People

Johnson, DW: AUTHOR [+3]

Abstract

A distributed file system allows a client machine to access files that reside on a remote server machine. In order to achieve high performance many distributed file systems allow clients to cache portions of server files so that not all accesses to a file's data require network transmissions. Caches must be kept consistent with the real data at the server.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Using the Overhead Costs of Network Messages to Choose Between Cache
Coherency Strategies in a Distributed File System

      A distributed file system allows a client machine to
access files that reside on a remote server machine. In order to
achieve high performance many distributed file systems allow clients
to cache portions of server files so that not all accesses to a
file's data require network transmissions. Caches must be kept
consistent with the real data at the server.

      One strategy for maintaining cache coherency is to use
synchronization modes.  Three sync_modes are sufficient:
   READ_ONLY  This sync_mode is used when all opens of the file are
read-only opens.  There are no writers; there are, therefore, no
cache coherency problems.  In this sync_mode all client caches are
allowed to contain data from the file.
   ASYNC      This sync_mode is used when only one node has the file
open.  Since only one node has the file open, there is no cache
coherency problem; data can be read or written to the local cache,
with modified data sent to the server when the file is closed or when
the sync_mode changes.
   FULL_SYNC  This sync_mode is used when more than one node has the
file open and at least one of the opens is for writing.  In this
sync_mode no client caches are used - all reads and writes are sent,
synchronously, to the server.

      The FULL_SYNC sync_mode no caches are used, performance
suffers, and network traffic increases.   A fourth sync_mode can
improve performance in some cases:
   SYNC_WRITE
              This sync_mode is used when more than one node has the
file open and at least one of the opens is for writing. Clients are
allowed to use local caches for reading, but all writes are sent to
the server.  When the server receives a write message from a client,
then the server sends Cache Invalidation Messages (CIMs) to the
clients with cached data.  When a client receives the CIM, it
discards its cached data and replies (with a CIM_ACK) to the server.
When the server has received all CIM_ACKs then updates the file.

      File usage patterns vary.  For some usage patterns FULL_SYNC
will provide better system performance, while SYNC_WRITE will perform
better for other usage patterns.  An ideal system would dynamically
monitor usage patterns and costs and then choose the sync_mode
(either SYNC_WRITE or FULL_SYNC) that will give the best system
performance.

      The following describes how a cost function can be calculated
with the result indicating which sync_mode will be the best choice.
      The following values are measured at client i:
      n_blk_cache(i)  Number of data blocks read from the client's
cache since the last CIM was received.
      n_read_cache(i) Number of read request totally satisfied from
the client's cache since the last CIM was received.
      c(data_blk(i))  Cost of tra...