Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

A SEMI-AUTOMATIC METHOD TO IDENTIFY PHRASES THAT REFER TO FREQUENT PROBLEMS OR SYMPTOMS IN FREE-FORM TEXT

IP.com Disclosure Number: IPCOM000236225D
Publication Date: 2014-Apr-12
Document File: 5 page(s) / 26K

Publishing Venue

The IP.com Prior Art Database

Abstract

The invention proposes a semi-automatic technique for creating list of all the named entities. The named entities which users care the most about are never left out. The technique observes at a past log of information captured in form of a search query log. Frequently occurring entities are included in a white list. White list of terms associated with frequent problems is created. Some of the cases are randomly selected. Human annotations are compared with terms automatically identified as problem terms based on the white list. Such two annotations are in promising agreement.

This text was extracted from a Microsoft Word document.
This is the abbreviated version, containing approximately 50% of the total text.

 

A SEMI-AUTOMATIC METHOD TO IDENTIFY PHRASES THAT REFER TO FREQUENT PROBLEMS OR SYMPTOMS IN FREE-FORM TEXT

BRIEF ABSTRACT

The invention proposes a semi-automatic technique for creating list of all the named entities. The named entities which users care the most about are never left out. The technique observes at a past log of information captured in form of a search query log. Frequently occurring entities are included in a white list. White list of terms associated with frequent problems is created. Some of the cases are randomly selected. Human annotations are compared with terms automatically identified as problem terms based on the white list. Such two annotations are in promising agreement.

 

KEYWORDS

Semi-automatic, query log, named entities, annotations, white list

DETAILED DESCRIPTION

Text-mining techniques are used to search hidden patterns and trends. Identification of named entities in an archive is helpful to map the named entities to precise meanings. The named entities are utilized by text-mining and machine learning algorithms for automatic summarization and meaning extraction, among others. One way to identify the named entities is to create a list of all the named entities which are encountered in a case archive. Creating such exhaustive list is cumbersome and error prone. Moreover, many of the named entities encountered are of negligible semantic value to a user. One technique to address such problem is to ignore the name entities which occur rarely in the case archive. However, there is a possibility that the rarely occurring name entity contains information which is of significant value towards satisfying a frequently expressed information requirement of users.           

A conventional technique relates to algorithms for a highly motivated problem such as annotating unstructured text with entity identifications (IDs) from an entity catalog. A formulation captures a tradeoff between local spot-to-label compatibility and a global, document-level topical coherence between entity labels.  A local hill-climbing algorithm is provided.

Another conventional technique relates to a semi-structured entity-relation (ER) data graphs which have diverse node and edge types representing entities and relations. In addition, nodes contain text snippets. A unified model is provided for ranking in ER graphs. An algorithm is provided to learn parameters of the unified model. Algorithm satisfies training preferences and estimate meaningful model parameters which represent a relative importance of ER types.

However, the conventional techniques are not efficient in creating list of named entities which satisfies requirement of users.

Hence there exists a need for a technique to build white list of the named entities.

The invention proposes a semi-automatic technique of creating list of all the named entities. The named entities which users care the most about are never left out. The technique observes at a past log of information captur...