Fast XML Document Recognition
Original Publication Date: 2001-Dec-26
Included in the Prior Art Database: 2003-Jun-20
Document recognition is an important function in the XML business model. Development efforts have been extensive in the past and new implementations are being developed to utilize document recognition. For example using document recognition as: a means to enable B2B data exchange based on document type
Fast XML Document Recognition
Document recognition is an important function in the XML business model. Development efforts have been extensive in the past and new implementations are being developed to utilize document recognition. For example using document recognition as: - a means to enable B2B data exchange based on document type
- a way to select preferences.
- document routing
- screen customization
- screen selection macros Since this is a pervasive model and because of the performance concerns of operating in the XML environment, any performance improvement would be welcomed. The model discussed is a means to create an optimized document recognition engine.
The solution, conceptually, is to recognize the document as early in processing as possible, utilizing internal machine capability, a software event model and the human knowledge of constraints. In many implementations today a problem is that decision logic is applied after the document is completely processed. For example this occurs in the xml model after the document has been parsed into a document object model or tree. My solution would enable recognition to occur concurrently while the document is being processed. The means to achieve this is based on implementing an event model that can be used during any early processing phase such as parsing. A summary of the high level logic flow in the parsing example is as follows.
The first requirement is that an event be fired for each node built during parsing. The subsequent document recognition object that is listening will then be satisfied causing the next phase to take place. Possible next steps are routing the document, responding to the caller or performing the next program execution step. The value in executing this logic is the termination of the remaining programming logic which can be for example, parsing the rest of the document. This saves processing cycles. In all cases progressing to the next programming step will be much faster. The larger the document, the more of a performance improvement that could be realized. Optimizing performance further can be achieved if with this model an effort is made to place recognition data as close to the beginning of the document as possible. The coding practice of placing tags, keys, and signatures at the beginning of the document is an implementation that will enable matching as early during document processing as possible.
Another model that stands to benefit greatly using this solution is the processing of data streams using the same type of logic as above, recognizing elements in a standardized structured data stream.
A key point to remember is, this recognition model can be expanded far beyond document recognition and applied to any model where we have an object (listener),
waiting for a stimulus.
As mentioned earlier my idea centers around the ability to understand recognition criteria Recognition of any type of data usually falls withi...