Browse Prior Art Database

Fast XML Document Recognition

IP.com Disclosure Number: IPCOM000015262D
Original Publication Date: 2001-Dec-26
Included in the Prior Art Database: 2003-Jun-20
Document File: 4 page(s) / 48K

Publishing Venue

IBM

Abstract

Document recognition is an important function in the XML business model. Development efforts have been extensive in the past and new implementations are being developed to utilize document recognition. For example using document recognition as: a means to enable B2B data exchange based on document type

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 41% of the total text.

Page 1 of 4

Fast XML Document Recognition

Document recognition is an important function in the XML business model. Development efforts have been extensive in the past and new implementations are being developed to utilize document recognition. For example using document recognition as: - a means to enable B2B data exchange based on document type

- a way to select preferences.

- document routing

- screen customization

- screen selection macros Since this is a pervasive model and because of the performance concerns of operating in the XML environment, any performance improvement would be welcomed. The model discussed is a means to create an optimized document recognition engine.

The solution, conceptually, is to recognize the document as early in processing as possible, utilizing internal machine capability, a software event model and the human knowledge of constraints. In many implementations today a problem is that decision logic is applied after the document is completely processed. For example this occurs in the xml model after the document has been parsed into a document object model or tree. My solution would enable recognition to occur concurrently while the document is being processed. The means to achieve this is based on implementing an event model that can be used during any early processing phase such as parsing. A summary of the high level logic flow in the parsing example is as follows.

The first requirement is that an event be fired for each node built during parsing. The subsequent document recognition object that is listening will then be satisfied causing the next phase to take place. Possible next steps are routing the document, responding to the caller or performing the next program execution step. The value in executing this logic is the termination of the remaining programming logic which can be for example, parsing the rest of the document. This saves processing cycles. In all cases progressing to the next programming step will be much faster. The larger the document, the more of a performance improvement that could be realized. Optimizing performance further can be achieved if with this model an effort is made to place recognition data as close to the beginning of the document as possible. The coding practice of placing tags, keys, and signatures at the beginning of the document is an implementation that will enable matching as early during document processing as possible.

Another model that stands to benefit greatly using this solution is the processing of data streams using the same type of logic as above, recognizing elements in a standardized structured data stream.

A key point to remember is, this recognition model can be expanded far beyond document recognition and applied to any model where we have an object (listener),

1

Page 2 of 4

waiting for a stimulus.

Further Details:

As mentioned earlier my idea centers around the ability to understand recognition criteria Recognition of any type of data usually falls withi...