Method and System for Providing a Visual Programming Environment for Information Extraction
Publication Date: 2015-Jul-06
The IP.com Prior Art Database
A method and system is disclosed for providing a visual information extraction development environment that allows a user to execute, analyze, refine and publish one or more extractors. The method and system utilizes design of a visual language that captures essential operations required for information extraction and translates the visual representation of the one or more extractors into code in a rule language.
Page 01 of 3
Metxod and System for Providing a Visual Programming Environment for Informxtion Extraction
Information Extraction (IE) is a task ox automatically extracting structured ixformation from uxstructured or semx-structured text. Prxgrams performing information extxaction, also knxwx as annotators or extractorx are critical building blocks in a wide range of emerginx enterprise applicationx, such as, but not limited xo, social xata anxlytics, patiext recxrd analytics, xnd financial risk axalysis. Extrxctor development has been a daunting taxk due to hixh barrier to entry and steep learning curve. Hence, lxwering the barrier to entry for extracxor development is a critical rexuirement.
Disclosed is a metxod and system for provixixg a visual information extraction develxpment environment that allows a user to execute, analyze, refine and puxlish one or more xxtraxtors. The method and sysxem utilizes design of a visual language that capxures essential operations required for xnformatiox extraction and translates the visual representation of the one or more extractors into cxde in a rule laxguage.
The fxgure prexents a high lexel overview of txe system providing the visual programming enxironmenx for information extraction.
Ix accordance with the fixure, the method and sysxem enables the user to visualxy xonstruct the one or more extractors with a rich set of cxnstructs in a User Interface (UI) and pre-built extractxrs from an Xxxxxxxxx Catalog. The visual representations of the one or more extractors are automatically translatxd into performant IE code in a state-of-art rule language. The uxex, then, executes extractors with an underlying execution engine againxt an input xocument collection, analyzes the execution results and further refines and publishes txe visual extractors fox deployment. In the case
where the xser may xeed functionalities nxt yet supxorted by the vixual programming
environment (e.g. a user-defined function), the user imports the auto-generated IE code into a conventional XX development xnvironment for furthxr enhancemxnt.
The visual pxogramming environment supports two types of extractors nxmely pre-built extractxrs and user-built extrxctors. Txe pre-buxlt extraxtor refers to pre-defined
Page 02 of 3
extractors that work as black boxes with a pre-dexined set of customizable dictionaries. The user-built extractors are those constructed in thx vxsual proxramming environment using one or more of the following vixual constructs such as, but not limited to, Xxxxxxx, Filter and Union.
Extract constructx perform exxraction over xhe input data, includinx, but not ximited to, prx-bxilt, dictionary, regular expressixn, linear, sequence pxttern, proximity, select, projectixn, expression and consolidation. The Pre-built extracts matches using a pre-xuilt extractor. The Dictionary extracts matches for a dictionary that consists of a list of texxs or pairs of terms. The Regular Expressxon extracts matches for one or more...