Browse Prior Art Database

Method and System for Extracting Temporal Information Corresponding to Artifacts in a Website

IP.com Disclosure Number: IPCOM000200485D
Publication Date: 2010-Oct-15
Document File: 5 page(s) / 125K

Publishing Venue

The IP.com Prior Art Database

Related People

Vipul Agarwal: INVENTOR [+3]

Abstract

A method and system for extracting temporal information corresponding to artifacts in a website is disclosed. Temporal information includes, but is not limited to timing of events, hours of operations of a business organization, and posting dates of the artifacts.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 38% of the total text.

Method and System for Extracting Temporal Information Corresponding to Artifacts in a Website

Abstract

A method and system for extracting temporal information corresponding to artifacts in a website is disclosed.  Temporal information includes, but is not limited to timing of events, hours of operations of a business organization, and posting dates of the artifacts. 

Description

Disclosed is a method and system for extracting temporal information corresponding to artifacts in a website.  Temporal information includes, but is not limited to, timing of events, hours of operations of a business organization, and posting dates of the artifacts. 

Spatial and temporal information corresponding to artifacts provide informative contexts while serving search results for search queries.  However, processes such as, extraction of metadata, search query interpretation, and matching processes focus mainly on associating spatial metadata with artifacts, detecting spatial context of the search queries and identifying the relevant artifacts corresponding to the search queries based on the spatial context and metadata.  Accordingly, temporal information is not considered while performing the processes. 

Fig. 1 illustrates a flowchart depicting a method for extracting temporal information corresponding to artifacts in a website by using processes including blob identification, blob association, and blob parsing.  Examples of artifacts include, but are not limited to an event, a business entity, and an organization.  Here, blobs are sections containing text in the website.

Figure 1

Initially, blobs are identified by crawling through the content in a website and hierarchically organizing the content as unstructured blobs based on existing hyperlink relationships and semantically inferred relationships.  The unstructured blobs are sections containing text in the website associated with the temporal information in the website.  Existing relationships include relationships between list pages and corresponding records and semantically inferred relationships include entity list pages and corresponding entity detail pages as shown in Fig. 2.  Based on the hierarchy of the blobs extracted, web-pages in the website are classified into unstructured pages, semi-structured list pages and list pages.  The list pages are furthered segmented into records using list extraction methods.  In some cases, the records in a list page link to new detail page and these pages are identified and appropriately added to the hierarchy.  For example, a list page for a store chain may link to a page containing details of a particular branch of the store chain.  Thereafter, the unstructured blobs are filtered into a set of relevant blobs by using binary word features and editorial supervision.  Binary word features are features associated with various attributes of the artifacts such as, name, address, hours of operation, etc.  Editorial supervision includes inputs correspon...