Browse Prior Art Database

A means of automatically truncating oversize strings when writing fixed length data taking into consideration the unit of length and the code page into which the string is to be written.

IP.com Disclosure Number: IPCOM000166826D
Original Publication Date: 2008-Jan-24
Included in the Prior Art Database: 2008-Jan-24
Document File: 2 page(s) / 43K

Publishing Venue

IBM

Abstract

The length of a string within a system destined for a serialised output format may be longer than the serialised output format permits. This disclosure provides a method and algorithm for automatically truncating such strings based on a model so that the strings conform to the length permitted by the serialised output format.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

A means of automatically truncating oversize strings when writing fixed length data taking into consideration the unit of length and the code page into which the string is to be written.

Prior to this solution if the length of a string destined for a serialised output format is greater than the length permitted by that serialized output format, then using the string as-is invalidates the output format, almost certainly rendering the output format useless. This solution allows a string to be truncated to the maximum length permitted by the serialised output format thus keeping the output format valid. This is useful when data has been received in a variable length form such as XML and is to be converted to a fixed length format such as a C or COBOL structure. The only known solution to this problem is for bespoke code to be written to truncate the string prior to serialisation. Specific code would have to be written for each string that is to be truncated.

    The core idea of the disclosed solution is that the truncation of the strings is done automatically based on a model. The model is deployed to a system that then does the truncation based on the model. The system is a piece of software that writes strings held internally to a serialised output format. The system accesses a model to determine the format of the serialised output.The advantage of this method over the other solution given above is that no bespoke code is required to truncate the strings before serialisation.

    The strings are represented internally within the system using a 2-byte unicode character encoding. The strings are to be serialised to an encoding that can be single-byte , double-byte or multi-byte.

Each fixed length string in the model has

- An encoding property. String.

- A length property. Integer.

- A length units property. Enumeration. This controls the units of the length property. The length units property can have values "characters", "character units" or "bytes".

- A padding character property. Char. This specifies the character to use to pad short strings to the required length during serialisation.

- A truncation property. Boolean. This indicates whether to truncate the string during serialisation.

Page 2 of 2

If the truncation property is set to true the following algorithm is used by the system...