Browse Prior Art Database

UNIQUE IDENTIFIERS IN AUTOMATED PROCESSING OF UNSTRUCTURED TEXT

IP.com Disclosure Number: IPCOM000236223D
Publication Date: 2014-Apr-12
Document File: 6 page(s) / 59K

Publishing Venue

The IP.com Prior Art Database

Abstract

The proposed technique relates to recognizing a presence and value of unique identifier terms (UIDs) such as part numbers, site IDs, technical report serial number, etc in unstructured datasets. The technique is more useful when a prevalence of UIDs is sufficiently high, for example, for a corpus of service, maintenance, commissioning logs of turbines, generators and other machines. The invention provides a technique in which such identifiers are exploited to extract insights from an unstructured corpus. The technique recognizes and utilizes the unique identifiers for building an effective knowledge management system

This text was extracted from a Microsoft Word document.
This is the abbreviated version, containing approximately 34% of the total text.

UNIQUE IDENTIFIERS IN AUTOMATED PROCESSING OF UNSTRUCTURED TEXT

BRIEF ABSTRACT

            The proposed technique relates to recognizing a presence and value of unique identifier terms (UIDs) such as part numbers, site IDs, technical report serial number, etc in unstructured datasets.  The technique is more useful when a prevalence of UIDs is sufficiently high, for example, for a corpus of service, maintenance, commissioning logs of turbines, generators and other machines. The invention provides a technique in which such identifiers are exploited to extract insights from an unstructured corpus.  The technique recognizes and utilizes the unique identifiers for building an effective knowledge management system.

KEYWORDS

Free-form text, automated processing, unique identifier term, unstructured datasets, knowledge management system

DETAILED DESCRIPTION

Several data sources are created by humans as free-form text.  Text-heavy data sources such as emails, web pages, service engineer notes, maintenance logs are referred to as unstructured data sources. Mining of such unstructured data sources for actionable insights is of immense value. Ongoing research in a field of computer-assisted knowledge management attempts to achieve a goal of mining insights from vast and diverse data sources.

            There are several conventional techniques for automated processing of the free-form text. Each document is treated as a bag of words and makes inferences based on frequencies of each word occurrence. Natural-language-processing (NLP) techniques are used to understand each sentence by following rules of the language such as grammar, semantics, etc.

            Therefore, there is a need in the art for improved automated processing of the free-form text.

            The proposed technique relates to recognizing a presence and value of unique identifier terms (UIDs) such as part numbers, site IDs, technical report serial number, etc in unstructured datasets. The technique is more useful when a prevalence of UIDs is sufficiently high, for example, for a corpus of service, maintenance, commissioning logs of turbines, generators and other machines. The invention provides a technique in which such identifiers are exploited to extract insights from an unstructured corpus. The technique recognizes and utilizes the unique identifiers for building an effective knowledge management system.

            For recognizing the unique identifier, a key-value pair hash is created which contains UIDs as keys and meanings of keys as values. The value is a simple string such as gas turbine rotor or a pointer to an elaborate document such as specifications list for a machine indicated by the UID key. Such UIDs and meanings of UIDs are extracted from various databases that businesses employ for a routine used by commissioning engineers, service personnel, etc. Modern tools such as Redis database server is employed for buil...