Browse Prior Art Database

# Fast Technique to Identify Repeated Characters and Its Applications

IP.com Disclosure Number: IPCOM000037040D
Original Publication Date: 1989-Nov-01
Included in the Prior Art Database: 2005-Jan-29
Document File: 2 page(s) / 13K

IBM

## Related People

Allsen, JK: AUTHOR [+2]

## Abstract

A fast technique for identifying repeated characters considers every four characters as an unsigned number. The unsigned number is EXCLUSIVE ORed (XORed) with 'CCCCCCCC'X, where 'CC'X is the character to be identified. From the result of XOR, the character that is different from the previous one can be located. The major benefit of this technique is to reduce the time on checking each individual character. Several applications of this technique are also addressed in this article.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 57% of the total text.

Page 1 of 2

Fast Technique to Identify Repeated Characters and Its Applications

A fast technique for identifying repeated characters considers every four characters as an unsigned number. The unsigned number is EXCLUSIVE ORed (XORed) with 'CCCCCCCC'X, where 'CC'X is the character to be identified. From the result of XOR, the character that is different from the previous one can be located. The major benefit of this technique is to reduce the time on checking each individual character. Several applications of this technique are also addressed in this article.

Assume that we are going to identify the length of a repeated character, 'CC'X, in a data stream where 'CC'X is any character.

The following steps show the algorithm of this technique:

1. Consider every four characters as an unsigned bit

string. This unsigned bit string is XORed with

'CCCCCCCC'X to obtain a result.

2. If the result is 0, go to step 1 to process the

next four characters.

Otherwise, go to step 3.

3. If the result is greater than or equal to

'01000000'X (2**24), the first byte is not a character

'CC'X. Go to step 7.

Otherwise, go to step 4.

4. If the result is greater than or equal to

'00010000'X (2**16), the second byte is not a character

'CC'X. Go to step 7.

Otherwise, go to step 5.

5. If the result is greater than or equal to

'00000100'X (2**8), the third byte is not a character

'CC'X. Go to step 7.

Otherwise, go to step 6.

6. If the result is greater than or equal to

'00000001'X (2**0), the fourth byte...