Context-Based Concept Resolution with Structured and Unstructured Sources
Publication Date: 2016-May-17
The IP.com Prior Art Database
Terms that refer to concepts within text are often more terse than what is required to identify the referent concept unambiguously. Referring spans often contain only enough information to enable a human reader to identify the relevant concept when combined with the surrounding context and the reader’s background knowledge. Disclosed here is a method of combining these same three elements in a novel way to produce more specific/appropriate concepts than can be found simply by searching for concepts based on the referring span.
Page 01 of 6
-Based Concept Resolution with Structured and Unstructured Sources
Based Concept Resolution with Structured and Unstructured Sources
Linking phrases in text to concepts in a Knowledge Base (KB) becomes increasingly difficult when the KB is very large, taxonomically deep and multiplicitous (containing multiple entries for the same concept). A KB that comprises multiple ontologies found on the semantic web may have these properties, as does the Unified Medical Language System (UMLS). Finding concepts in such KBs presents at least three unique challenges: Discovery (how to know whether a KB concept exists for the phrase); Multiplicity (how to know whether there are additional entries in the KB for the same concept); and Granularity (how to know whether there more specific concepts that capture more semantics of the context of the phrase).
These challenges also pose new problems for evaluating systems designed to address them. If there are many target concepts in the KB, evaluation must account for the case where different systems identify different numbers of "correct" concepts. And given a deep taxonomy, one system may discover an appropriate concept, while a second system may find a
"better" (more specific) concept matching more of the context of the phrase.
1. Problem Description
Leveraging any kind of structured knowledge resource for natural language processing
requires the ability to find entries in the resource from the references made to them in text. In some applications, this task is mostly limited to entity linking: taking names of things mentioned in text and finding the corresponding unique identifier for that entity in a structured resource . In word sense disambiguation , the problem is one of taking words mentioned in text and finding the intended word sense in a dictionary resource such as WordNet . These two problems seem similar, but algorithms for them can be quite different. For word sense disambiguation, target dictionary entries match the text word exactly, so there is no need to "discover" candidate entries. Each entry has a gloss that can be exploited to provide a context for choosing among them. For entity linking, the resource entries may not have a label that matches the text exactly, but will typically have types and other relationships in a graph.
For medical question answering, one would like to exploit the knowledge aggregated from many sources in the Unified Medical Language System . This resource presents a new complexity of problems for finding the concept (more precisely, its Concept Unique Identifier, or CUI) for medical terms mentioned in text. The problem is different enough from entity linking and word sense disambiguation that it is given a different name: Context-Based Concept Resolution (CBCR). CBCR shares many properties with word sense disambiguation: many common medical terms (but not all) in UMLS have gloss entries, and the terms do not refer to entities but more abs...