Browse Prior Art Database

An intuitive application for text analytics Disclosure Number: IPCOM000238106D
Publication Date: 2014-Aug-01
Document File: 8 page(s) / 481K

Publishing Venue

The Prior Art Database


This article describes a system aimed at enabling real-time, interactive text analytics.The operation of the system is based on similar principles to that of a spreadsheet. The article proposes a number of modifications to current spreadsheet technology aimed at supporting real-time, interactive text analytics.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 34% of the total text.

Page 01 of 8

An intuitive application for text analytics

FACT EXTRACTION is the process of transforming natural language statements into structured facts as show below in figure 1:

Figure 1 : Fact Extraction

    Known sophisticated text analytics tools enable the rapid mapping of phrases onto structured facts / queries.

    The mapping of sentences onto facts and questions on queries is performed by a set of processing rules. These rules identify specific grammatical patterns and then construct an appropriate fact or query accordingly.

    Text analytics rules are conventionally applied in batch mode. This approach, however, means that errors are propagated through the system as it is difficult to write perfectly accurate rules. In particular, the user (typically an analyst) struggles to understand how rules will perform against previously unseen data.

    In addition, it is not easy to perform mathematical or statistical functions on the information that is extracted as part of the extraction process. For example to return a simple count of the number of times a type of fact occurs or to locate the maximum value contained within a document

    This disclosure addresses the issues above through the application of a spreadsheet philosophy to text analytics. With such an approach the Analyst will be provided with an interactive environment that allows the detailed manipulation of data in addition to the application of text analytics rules. For example, the Analyst may write a rule to extract dates, apply that rule to the data and then quickly correct any errors before progressing to the next stage of the analysis. In this way, errors are not propagated through the system and the Analyst is able to ensure accuracy of the data despite the inherent limitations (in accuracy) of the rules.

    This invention provides Natural Language Processing functionality to a spreadsheet like application. The Analyst starts the application and is presented with a conventional spreadsheet layout. The only difference is that the editor area is a complete side panel (or top or bottom panel depending on User preference).

This enables the user to examine much larger chunks of text and connects the current selected cell back to its original document text.

Page 02 of 8

Figure 2 : Application opening screen with the enlarged input column

  The Analyst can import documents into any cell: • by selecting a single cell and importing a single document.

• by selecting a number of cells and importing up to that number of documents.

• by selecting a column or a row and importing from a directory (up to the maximum number of cells in column / row).

Selecting a cell (highlighted in red) will display the contents in the text editor pane. In this case the cell contains the entire text of the initial document and so the entire text is also highlighted in red.

Figure 3: Input text entered into cell A1


    The next function to perform is tokenisation, this function could have a lot of depth and...