Browse Prior Art Database

Unstructured HTML file conversion to well-formed XHTML using external rule based parser.

IP.com Disclosure Number: IPCOM000016177D
Original Publication Date: 2002-Aug-16
Included in the Prior Art Database: 2003-Jun-21

Publishing Venue

IBM

Abstract

Disclosed is a rule based parser that can convert unstructured HTML data (or any SGML data) to welformed XML. HTML is a standard document format used on the World Wide Web. XHTML is a reformulation of HTML conforming to the rules of XML. XHTML documents are XML conforming. As such, they are readily viewed, edited, and validated with standard XML tools. Well formed XML files can be manipulated in a much easier fashion that unstructured files, using standard XML parser technologies such as SAX and Xerces the process of converting these files to other formats (such as Word Processor specific formats) is very much simplified.