Browse Prior Art Database

Method to extend legacy tools and software environment to work with different encoding format

IP.com Disclosure Number: IPCOM000127650D
Original Publication Date: 2005-Sep-07
Included in the Prior Art Database: 2005-Sep-07
Document File: 2 page(s) / 28K

Publishing Venue

IBM

Abstract

In order to process the Unicode data such as counting number of occurrence of words or comparing 2 Unicode data files, the legacy tool or software that works only with an ASCII format input file must be rewritten or recreated to support the Unicode encoding data. To reuse the existing tools with the different encoding input data such as Unicode encoding, the characters are converted into a number representation as a one-to-one mapping manner. The Unicode file that contains Unicode characters are saved into an ASCII format with all characters change into a number representation. The existing tool that works only with an ASCII format will be able to process these files. With this idea, the existing tools do not have to be rewritten or modified to work with different file encoding format

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Method to extend legacy tools and software environment to work with different encoding format

In order to process the Unicode data such as counting number of occurrence of words or comparing 2 Unicode data files, the legacy tool or software that works only with an ASCII format input file must be rewritten or recreated to support the Unicode encoding data. Some tools are written in such a way that they are very hard or too complex to modify, or the source code is not available. For example, most tools in the Linux® systems are written in Perl, which generally supports only ASCII encoding. In addition, the older version of Perl does not work with Unicode text file. In order to process the Unicode data, the new version of Perl has to be installed. The Perl code has to be modified to accept the Unicode data. With the upcoming of different language support and different encoding format data file, it is not possible to continue modifying the existing tools to support all encoding or all languages.

This idea is to use the tool that works only with ASCII format (or a specific character encoding data files) to process the input files encoded in Unicode or any character encoding. For this technique, the characters are converted into a number representation as a one-to-one mapping manner. The Unicode file that contains Unicode characters is saved into an ASCII format with all characters changed into a number representation. The existing tool that works only with an ASCII format will be able to process these files. With this idea, the existing tools do not have to be rewritten or modified to work with a different file encoding format. This tecnique is not limited to apply to the tools that support only ASCII data format file. It can be applied to any data encoding data. For example, the tool that requires UTF8 encoding input file can be used with the any encoding type after the conversion into a one-to-one number representation.

The conversion is a on...