Browse Prior Art Database

Ontology Based Query Correction without query logs

IP.com Disclosure Number: IPCOM000247799D
Publication Date: 2016-Oct-06
Document File: 5 page(s) / 97K

Publishing Venue

The IP.com Prior Art Database

Abstract

To develop an auto correction system that can make semantically valid query corrections using the ontology, without the need of query logs. The system uses an Ontology to capture the semantics of the query and does semantics context aware corrections.​​

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 33% of the total text.

Page 01 of 5

Ontology Based Query Correction without query logs

Problem Statement:

Given a mistyped user query, identify errors existing in the user query which may include semantic errors too and suggest possible corrections.

Drawback of Existing Systems

Given a mistyped user query, there are various existing techniques trying to identify and suggest

possible query corrections which are categorically listed as shown below:


(1) Spell check/Grammatical syntax: Consults a language dictionary to scan through each typed word for any possible spelling correction. If any spelling mistake is detected, it is typically corrected with the nearest match found in the dictionary. This is supposedly the most elementary among the existing methods which can correct user query only if there is some spelling mistake. But this is the least robust method too. Consider a scenario when there is no spelling errors but the question is still a mistyped one. e.g, "Capital of Spain" mistyped as "Capital of Span". Mistyped word "span" being a dictionary word, spell check can't detect anything wrong here.


(2) Using query logs: Using query logs are the most common method used for query correction systems. Logs are mostly used to model the likelihood of a set of words coming together in forming past valid queries. When the likelihood is properly modeled with sufficient past query logs, it can identify "capital of spain" has a much higher likelihood than "capital of span" in a valid query and thus can perform the needed correction.


(3) Using a pre-defined document : This is an alternative to query logs. Instead of using query logs, in this case the sentences in the document corpus is used to model the likelihood for set of words to appear together.

In general use of query logs or document corpus helps the system to acquire more robustness in identifying and correcting user queries. However, as in case with any learning model based approach, this also has the restriction of being too specific to data and sensitive to absence of important events, etc. Moreover, since learning based methods rely on statistical properties of data, they are prone to get swayed away by dominant data points. So learning based methods may not be sufficient in capturing sematic context or in checking if a user query is semantically valid.

Below we compare two of the most popular existing Question Answering engines viz (1) Wolfram Alpha and (2) MIT Start against some of the manually created mistyped queries to highlight the drawback of existing systems

Mistyped Question

Correction

Needed

What is the Copital of India Copital->Capital Success

Wolfram Alpha

MIT Start

Success

1


Page 02 of 5

What is the Capital of Span Span->Spain Failure

What is Today's weather in

Chilly

What is the birthdate of President

Osama

To overcome the challenges in the query correction, the system introduces a new paradigm of query correction that does not require any form of query logs but relies on ontology based domain knowle...