Browse Prior Art Database

String-Searching Algorithm for Mixed Single-Byte Character Set/Double-Byte Character Set Data Stream

IP.com Disclosure Number: IPCOM000034658D
Original Publication Date: 1989-Mar-01
Included in the Prior Art Database: 2005-Jan-27

Publishing Venue

IBM

Related People

Authors:
Liu, JM [+details]

Abstract

This algorithm searches for any occurrence of a character string in a mixed single- and double-byte character sets. Many algorithms have been designed to search for any occurrence of a character string in a text. However, those algorithms were designed under the assumption that texts are in a single-coded character set, e.g., EBCDIC characters. Those algorithms may fail when applied to a data stream with mixed single- and double-byte character sets. A mixed SBCS/DBCS (Single-Byte Character Set/Double-Byte Character Set) data stream, as described here, is a data stream which uses SO/SI (shift out/shift in) control characters to separate SBCS from DBCS. SO control character indicates shifting from SBCS to DBCS characters and SI control character indicates shifting back to SBCS characters.