Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

A design of an algorithm to match users queries with frequently asked questions

IP.com Disclosure Number: IPCOM000006429D
Publication Date: 2002-Jan-02
Document File: 4 page(s) / 81K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a design of an algorithm to match users queries with frequently asked questions (FAQs). Benefits include improved relevancy in searches for information about a topic.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 50% of the total text.

A design of an algorithm to match users’ queries with frequently asked questions

Disclosed is a design of an algorithm to match users’ queries with frequently asked questions (FAQs). Benefits include improved relevancy in searches for information about a topic.

              The disclosed design is of an innovative algorithm that provides a natural-language interface for FAQs. The algorithm calculates the similarity between the user’s and the FAQ’s questions. After finding the best match, the corresponding answer can be output to the user. Though originally designed for English text, the algorithm is not limited to any specific language.

              The algorithm consists of following components:

•     Pre-process

      -             Knowledge Base and Dictionaries

            ▪             Build a Knowledge Base including the structure of English question sentences
                                          and the definition of a standard question set. All input questions can be
                                          transferred to questions in this set.

            ▪             Construct a domain-related keyword dictionary and concept dictionary.

      -             Pre-process of FAQ

            ▪             Tokenize and stem question-answer pairs of FAQ. Use a domain concept                                                                                           dictionary to map words with concepts.

            ▪             Build two Inverse Index Tables, one is for all verbs and the other for the rest                                            words.
                                          Two tables are built automatically followed by the human revision.

•     Question Analysis

      1.           Stem and tokenize the users’ questions and use a domain concept dictionary to map                                                           the              word set. Get the main features such as WH-word, interrogative key words (for                                                            example, often, long, big, soon), pronoun, helping verb, and domain key words.

      2.           Identify the abstract question type using interrogative words and interrogative key                                                    words.
                            Abstract question types are represented as vector

. Here,

is the                                                  number of question types;

is the occurrence of type mark words, such as Q-WHERE,
                            Q-HOWLONG, Q-HOWMANY, etc. This value is a Boolean value. Definition of the
                            question type lies in the completeness of parser.

      3.           Determine the question’s structure pattern with the acquired features in step 1.
                            Use T-rules to transfer these questions to standard question pattern, and at the same                                                            time, extract the predicates. Use concept dictionary to transfer all verbs to concept.

•     Acquisition of Question Focus

      Use question type mark list to scan the question to get

, and then obtain the
              question focus (see Figure 1).

•     Question Matching

      1.           Use Q & A pairs in FAQ to construct the discrete FAQ question space. Each keyword                                  in predicate list and non-predicate list is defined as:

,

is                                                      the                number of question answer pairs in FAQ documents.

      2.   Map a new question to the predicate and non-predicate keyw...