Browse Prior Art Database

Improved message architecture derived from ASN/1 and BER Disclosure Number: IPCOM000016125D
Original Publication Date: 2002-Sep-16
Included in the Prior Art Database: 2003-Jun-21

Publishing Venue



There are numerous ways to encapsulate data into messages and each new technique has some specific useful characteristics which make it interesting in a particular domain. In this case the domain is loosely coupled high efficiency communications. The XML specification fits into part of this space but notably doesn't deliver high efficiency since the encoding is not lightweight and the decoding is complex and therefore costly both in CPU consumption and code footprint. Ideally a way to encapsulate data with all the individual elements tagged as in XML is required, but with the efficiency that more 'traditional' approaches deliver. A very good example is the standard encoding of ASN/1 (Abstract syntax notation 1 using BER (basic encoding rules). In essence ASN/1 allows data to be defined as either primitive or constructed; primitive items have some type and value associated with them, constructed items contain some set of primitive and constructed items. Using BER data is written on the wire in a sophisticated way using the TLC (tag length content) model. The first byte defines the 'tag' for the data; essentially an integer in one of four namespaces; application or universal, context specific or general. A single bit in the byte defines primitive or constructed fields and optionally the tag value can expand into bytes following the initial tag byte. The length is encoded into one or more bytes and is then followed by the data. This format is used by currently adopted protocols such as LDAP. The key problem with this encoding is that it is not terse and thus the temptation is to combine primitive fields into 'packed types' to avoid the overhead of individual tagging. A 'user' type can be defined which would have a string for the name, an integer for the date of birth. This technique works but means that the data can only be parsed by readers who know the meaning of the various tags which are inherently application specific. The proposed modification concentrates on making the encoding more efficient by combining the tag and length elements into a single byte in the common case of short, fixed length, primitives. The result is an encoding in which the primitive data items have a universal type associated with them eg UTF8 etc. This allows any reader of the message to correctly format it without prior understanding of the meaning of the tags employed in the message. This achieves the advantage of XML; universal readability, together with the advantage of BER; high efficiency encoding and decoding. This new encoding is referred to as Compressed Encoding Rules (CER). The encoding technique is as follows. The tag byte is split into two bits and two separate three bit fields. The top bit (bit 7) specifies primitive or constructed data For primitive data Bit 6 specifies fixed or variable length data bits 5 to 3 specify the tag which can lie in the range 0 to 6 with all larger values encoded as value 7 (overflow). We will define 'overflow' behaviour later. bits 2 to 0 specify the type and in the fixed length case (see bit 6) by implication the length of the following data. Hence we define 1, 2, 4 or 8 byte 1