Browse Prior Art Database

Providing Ad Hoc Integrations To Enable Data Updates In User Owned Database

IP.com Disclosure Number: IPCOM000237488D
Publication Date: 2014-Jun-19
Document File: 3 page(s) / 27K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method and system is disclosed for providing ad hoc integrations to enable data updates in user owned database with a trusted source.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 44% of the total text.

Page 01 of 3

Providing Ad Hoc Integrations To Enable Data Updates In User Owned Database

Disclosed is a method and system for providing ad hoc integrations to enable data updates in user owned databases with a trusted source. The trusted information can be made available via Hyper Text Markup Language (HTML) based publication.

The disclosed method and system identifies a database or a table which requires updates. The method and system also identifies a trusted website which contains update information. The information ideally changes on a weekly or monthly basis. The method and system recurses through the website to identify candidate webpage tables that contain update information relevant to the database table requiring update. Thereafter, a system administrator approves or disapproves candidates. The method and system compares data listed on a webpage table to contents of database table records. The database table records require update in order to identify correlations between individual table rows and database records. The correlations are inferred by parsing the web page HTML and examining the structure of the database table. The method and system examines each column of data. The column which displays unique items in order to enable the identification of an implicit "key" for each row of data is searched.

The disclosed method and system applies additional heuristics. For example, the method and system discards any columns which contain dates. The positioning in the database table is analyzed. The method and system enables analysis of column headers for words or synonyms relating to an identifier such as, but not limited to, key, identity (ID), code. A lexical analysis of the column values is also performed. For

example, the method and system identifies dates that are not likely to be keys or

whether abbreviations/number phrases are likely to be keys. Once the product name column on the web page is identified, the entries in the product name column are correlated to the database entries in field "ProductName". The individual rows of web page data are correlated to a particular database table entry. Once the "key" to the database is obtained, the other fields in the database entry can then be updated based on the other columns in the database table row. The system proposed correlation between database entries and webpage table rows can be verified or rejected manually.

The disclosed system architecture includes a classification engine. The method and system automatically detects and maintains database information from a trusted source. The classification engine is a key component of the system. The classification engine utilizes statistical extrapolation to identify data listed on a webpage table. The method and system parses webpage table content into groups based on quantitative information on one or more characteristics inherent in the items. The characteristics are based on a training set of database records.

The disclosed method and...