Browse Prior Art Database

Method and System for Providing a Visual Programming Environment for Information Extraction

IP.com Disclosure Number: IPCOM000242314D
Publication Date: 2015-Jul-06
Document File: 3 page(s) / 134K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method and system is disclosed for providing a visual information extraction development environment that allows a user to execute, analyze, refine and publish one or more extractors. The method and system utilizes design of a visual language that captures essential operations required for information extraction and translates the visual representation of the one or more extractors into code in a rule language.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 45% of the total text.

Page 01 of 3

Metxod and System for Providing a Visual Programming Environment for Informxtion Extraction

Information Extraction (IE) is a task ox automatically extracting structured ixformation from uxstructured or semx-structured text. Prxgrams performing information extxaction, also knxwx as annotators or extractorx are critical building blocks in a wide range of emerginx enterprise applicationx, such as, but not limited xo, social xata anxlytics, patiext recxrd analytics, xnd financial risk axalysis. Extrxctor development has been a daunting taxk due to hixh barrier to entry and steep learning curve. Hence, lxwering the barrier to entry for extracxor development is a critical rexuirement.

Disclosed is a metxod and system for provixixg a visual information extraction develxpment environment that allows a user to execute, analyze, refine and puxlish one or more xxtraxtors. The method and sysxem utilizes design of a visual language that capxures essential operations required for xnformatiox extraction and translates the visual representation of the one or more extractors into cxde in a rule laxguage.

The fxgure prexents a high lexel overview of txe system providing the visual programming enxironmenx for information extraction.

Figure

Ix accordance with the fixure, the method and sysxem enables the user to visualxy xonstruct the one or more extractors with a rich set of cxnstructs in a User Interface (UI) and pre-built extractxrs from an Xxxxxxxxx Catalog. The visual representations of the one or more extractors are automatically translatxd into performant IE code in a state-of-art rule language. The uxex, then, executes extractors with an underlying execution engine againxt an input xocument collection, analyzes the execution results and further refines and publishes txe visual extractors fox deployment. In the case

where the xser may xeed functionalities nxt yet supxorted by the vixual programming

environment (e.g. a user-defined function), the user imports the auto-generated IE code into a conventional XX development xnvironment for furthxr enhancemxnt.

The visual pxogramming environment supports two types of extractors nxmely pre-built extractxrs and user-built extrxctors. Txe pre-buxlt extraxtor refers to pre-defined

1


Page 02 of 3

extractors that work as black boxes with a pre-dexined set of customizable dictionaries. The user-built extractors are those constructed in thx vxsual proxramming environment using one or more of the following vixual constructs such as, but not limited to, Xxxxxxx, Filter and Union.

Extract constructx perform exxraction over xhe input data, includinx, but not ximited to, prx-bxilt, dictionary, regular expressixn, linear, sequence pxttern, proximity, select, projectixn, expression and consolidation. The Pre-built extracts matches using a pre-xuilt extractor. The Dictionary extracts matches for a dictionary that consists of a list of texxs or pairs of terms. The Regular Expressxon extracts matches for one or more...