Arbitrary Character Sets (RFC0373)
Original Publication Date: 1972-Jul-01
Included in the Prior Art Database: 2019-Feb-12
Internet Society Requests For Comment (RFCs)
NWG/RFC #373 14 July 1972 NIC 11058 SU-AI
ARBITRARY CHARACTER SETS
by John McCarthy
It would be nice to be able to have documents stored in computers that could include arbitrary characters and to be able to display them on any CRT screen, edit them using any keyboard, and print them on any printer. The object of this memorandum is to suggest how to get there from here with special reference to the ARPA network.
Where are we now?
(1) At present, there is 96 character ASCII, and everyone agrees that it should be included in any larger set.
(2) Many installations are dependent on 64 character sets which do not even include the lower case latin alphabet.
(3) At the Stanford Artificial Intelligence Laboratory, we have a 114 character set that includes 96 character ASCII and which is implemented in our keyboards, displays, and line printer
(4) Printers are becoming available that get their character designs out of memory, for example, the Xerox XGP printer, one of which we are getting.
(5) The IMLAC type display has the character designs in main memory so that changing the displayed set is just a matter of reloading the memory.
(6) Many display systems share the character generator among many display units. In some of these, e.g. the Datadisc, arbitrary sets are probably feasible (using kludgery to be described later), but in other systems, e.g. our III’s arbitrary sets are not feasible.
One possible approach to communication in expanded character sets is to produce an expanded standard set of characters, perhaps using 8 or 9 bits and expect new equipment to implement this set. This approach has the disadvantage that it will be very hard to get agreement on what the next step should be, and even if formal agreement is realized, many groups will find it in their interest to ignore the standard.
NWG/RFC# 373 JMC 14-JUL-72 12:41 11058 ARBITRARY CHARACTER SETS by John McCarthy
Therefore, I would like to suggest that the next step be to arbitrary character sets. I suggest implementing this in the following way:
(1) There be established a registry of characters. Anyone can register a new character. Each character has a unique number, 17 bits should be enough even to include Chinese. Besides this, each character has a name in ASCII usually mnemonic. Finally, the character has a design which is a picture on a 50 by 50 dot matrix.
(2) Besides the registry of characters, there is a registry of characters sets, which different groups are using for different classes of documents. A registered character set has a registry number and a table giving the correspondence between the character codes as bit sequences and the registered character numbers.
(3) Associated with a document is a statement of the character code used therein. This may be one of the registered codes or it may contain in addition modifications described by an auxiliary table giving the code correspondence with registered character numbers. A character code may have an escape character...