Browse Prior Art Database

File Encoding Converter

IP.com Disclosure Number: IPCOM000012862D
Original Publication Date: 2003-Jun-04
Included in the Prior Art Database: 2003-Jun-04
Document File: 1 page(s) / 5K

Publishing Venue

IBM

Abstract

Dynamic File Encoding Conversion on Unix

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 63% of the total text.

Page 1 of 1

File Encoding Converter

  Disclosed is a new mechanism for ensuring that encoding of text files on a Unix system always match the current machine locale. If a file's encoding does not match the locale of the machine, then characters will appear corrupted. There can be many encodings per language on a Unix system. For example, there are three encodings for Japanese on AIX - Ja_JP(IBM-943), ja_JP(IBM-eucJP), and JA_JP(UTF-8). This is important because files such as license files are typically installed as text files. The user must be able to view the license files without corruption after the software has been installed. The mechanism we propose will ensure that text files are readable on any locale on a Unix system.

Text files (such as license and README files) are usually created when software is installed on a system. At install time, the current locale of the machine is known. So an install program can install text files in the proper encoding. In addition to creating the files in the proper encoding, it will store information about the file's encoding. This information is used to change the encoding of the files if the machine's local changes.

How a machine's locale can change: As stated earlier, there are many encodings for the same language. Most Unix desktops allow a user to select the session encoding (UTF-8, ISO8859-1, etc) when they login. So it is possible that a user can login in later with a different session encoding.

How the file reencoding procedure...