FTP data compression (RFC0468)
Original Publication Date: 1973-Mar-08
Included in the Prior Art Database: 2000-Sep-13
Internet Society Requests For Comment (RFCs)
Network Working Group R. Braden
Request for Comment: 468 UCLA/CCN
NIC: 14742 March 8, 1973
FTP DATA COMPRESSION
Major design objectives of the proposed File Transfer Protocol (FTP)
are reliability and efficiency for transmission of large files.
Efficiency has two faces: efficiency of the host CPU's, and efficient
use of the Network bandwidth. Block mode is intended to minimize CPU
overhead for bandwidth efficiency, there is a mode called "HASP" in
RFC 454. The "HASP" mode of FTP is really transmission with data
compression, i.e., an encoding scheme to reduce the information
redundancy in the messages.
RFC 454 contains no explicit definition of the "HASP" or compressed
mode, but instead notes that a future RFC by yours truly will define
the mode. Students of FTP may find this scarcely credible, but you
are now reading the promised RFC. It turned out to be much farther
in the future than any of us expected. Mea Culpa.
In the early years of the Network, its major uses have been remote
terminal interactions and the small-to-medium-sized file transmission
typical of remote job entry. As facilities such as the Illiac IV and
the Data Machine become operational on the Network, and the Network
community begins to include users with heavy data transmission
requirements, large file transmission will become a major mode of
Network use. For example, one user of CCN expects to send 2 x 10**8
bits of data _each_ _day_ over the Network.
Local byte compression of the type proposed here is particular
effective for reducing the size of "printer" files such as those
transmitted under the Network RJE protocol. Experience with CCN's
RJS service has shown a typical compression of print files by a
factor of between two and three. Since FTP was intended to contain
the data transfer part of Network RJE protocol as a subset, it is
appropriate to include a print file compression mechanism in FTP.
These considerations led the FTP committee to include a compressed
mode within FTP.
The two main arguments for data compression are economics and
convenience (usability). Consider first economics, which is
essentially a trade-off between CPU time and transmission costs. Of
course, as long as Network use is a free commodity, the economics of
data compression are all bad. That happy state won't last forever.
What does data compression cost?
Let us consider only simple linear compression schemes, such as the
one proposed here. By linear, I mean that the CPU time to examine a
source record is proportional to number of bytes in the record. A
simple linear scheme could detect repeated single characters, for
example. One could imagine quadratic schemes, which detected
repeated substrings; but excep...