Surety is performing system maintenance this weekend. Electronic date stamps on new Prior Art Database disclosures may be delayed.
Browse Prior Art Database

Method of adapting graph data for NLP and IR processing

IP.com Disclosure Number: IPCOM000239766D
Publication Date: 2014-Dec-01
Document File: 6 page(s) / 147K

Publishing Venue

The IP.com Prior Art Database


Proposed here is the method of transforming encoded in graphs knowledge into format suitable for NLP processing and IR systems . Method describes steps of generating analytical sentences from extracted graph data .It is based on machine learning application for building domain graph type classifiers, functions on converting pixels differences into the language terms and applying rules based transformation for generating sentences from domain sentences templates .

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 44% of the total text.

Page 01 of 6

Method of adapting graph data for NLP and IR processing

Graphs are important method of communication and common data representation format. They are widely used in various domains for presenting and analyzing information.

Incorporated in text documents Graph images require specialized processing. There are multiple techniques for image recognition and extracting data from images. However, not too many of those specialize in graph data extraction. Another important aspect needs to be addressed is evaluation of the graph image processing error so that it can

be transformed in NLP suitable format.

Prior art search: Using extracted image text US 8503782 B2 This patent defined how to apply image regions to extract text data.

Using extracted image text US 8503782 B2 This patent defined how to apply image
regions to extract text data.

Proposed method makes knowledge encoded in graphs available for NLP processing. Included in corpus graph data become valuable source

of information for sentiment analysis, analytical information discovery and

background facts for QA systems.

Summary characteristics of the solution :

• Applying specific domain lexicons to construct factual, analytical and error types of statements from captured image graph data .

• Defining the transformation of image processing error into the format suitable for NLP .

• Method is based on ML graph type classifiers.

Method suggests applying domain lexicon for generating three types of NL statements: factual, analytical and error statements.

1) Factual statements reflect on data presentation purpose of graphs and are based on extracted from graph factual data.

For example: "Market value of portfolio X is S $ ".

2) Analytical statements reflect on data analysis purpose of graphs and are based on primary analysis performed on factual data.

For example: "Market value of portfolio X is larger than Market value of portfolio


Page 02 of 6


"The highest payments are made in XY year , the payments declines steadily during ZW years"

"New York City has significantly more employees per 1,000 of population than other large cities."

"Interest rates peaked in XY year and then declined significantly"

3) Error statements provide accuracy margin expressed in Error values.Error values are computed based on "raw " graph image differences mapped to the graph units values.

For example: " Market value of portfolio X is estimated S $ given D % accuracy ".

General information referenced in the description.

Graph attributes: General description ,Axis Labels,Legends,Data series

Most common basic types of graphs : column ,bar, line, pie, area ,etc.

Graph data processing solution presented as a multi-module pipeline in the Flow Chart Diagram 1. .

Flow Chart Diagram 1.


Page 03 of 6


Page 04 of 6

Graph processing modules overview ( illustrated with financial graph sample.)

1. ML training module is responsible for training Graph type classifiers . The training is based on a specific domain g...