Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Dealing With Communications Failures in a Distributed File System

IP.com Disclosure Number: IPCOM000121303D
Original Publication Date: 1991-Aug-01
Included in the Prior Art Database: 2005-Apr-03
Document File: 2 page(s) / 94K

Publishing Venue

IBM

Related People

Johnson, DW: AUTHOR [+2]

Abstract

Disclosed is method of dealing with, possibly temporary, communications failures in the kind of distributed file system that allows client nodes to cache portions files and also uses a protocol to guarantee cache coherence.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Dealing With Communications Failures in a Distributed File System

      Disclosed is method of dealing with, possibly temporary,
communications failures in the kind of distributed file system that
allows client nodes to cache portions files and also uses a protocol
to guarantee cache coherence.

      A distributed file system allows a client machine to access
files that reside on a remote server machine. In order to achieve
high performance many distributed file systems allow clients to cache
portions of server files so that not all accesses to a file's data
require network transmissions.  Caches must be kept consistent with
the real data at the server.  A client can modify cached data.  If
Client_A holds a data range in its cache when Client_B requests data
from the same range, then the server sends a revoke message to
Client_A.  Upon receiving the revoke message, Client_A discards all
unmodified data from its cache and writes modified data back to the
server.  The server incorporates the modified data into the file
before satisfying Client_B's access request.

      Communications failures can prevent the timely delivery of
messages sent between client and server machines. Temporary
interruptions of communication can result in some messages being
lost, but with succeeding messages being successfully delivered, and,
perhaps, with one of the communicating machines being unaware that a
message was not successfully delivered.  For instance, the following
sequence could occur:
      -    a temporary communication failure prevents the
           delivery of a server-to-client cache revoke
           message.
      -    the server is informed, by the underlying
           communications protocol that the message was not
           delivered, but the client remains in ignorance
           both of the message and of the fact that a
           delivery attempt failed.  The client now has stale
           data in its cache, but is not aware that the data
           is not correct.
      -    the temporary communications interruption ends,
           and subsequent messages between client and server
           are successfully delivered. The client is unaware
           that anything is wrong.

      This problem is solved by implementing the following rules:
      -    When a server is informed by the underlying
           communications protocol that a revoke message
           could not be delivered, then the server node
           remembers this fact.
      -    Clients send periodic "heart beat" messages to the
           server.  These messages may fail, may return an
           indication that all is well, or may return a reply
           from the server indicating that some previous
           revoke me...