Apparatus and method for joint sentence compression and multi-document query-focused summarization

Publication Date: 2014-Feb-07
Disclosed are an apparatus and method to perform multi-document query-focused summarization of sentences by performing, for example, query-focused sentence compression.

Multiple state of the art systems exist for summarizing documents , but each has drawbacks. Search engines provide top-ten summaries, only presenting results that match keywords from the query and highlight the individual matches . The results are not summaries but rather excerpts from the original documents that contain one or more matches. These systems perform sentence selection and , if length constraints are exceeded, extract the portions of the sentences that contain the matches . Search engines also do not perform multi-document summarization. The majority of summarization systems is not query-focused, and does not take as input a query or a topic. Query-focused summarization systems use the query to select relevant sentences and then perform syntax-driven compression independently of the query, typically by means of a fixed set of rules. Thus, summarization consists of sentence selection followed by sentence compression, and the query is only used for sentence selection. Some systems also perform sentence compression (independently of the query) prior to sentence selection; in this case, sentence compression is used to generate additional candidates for the sentence -selection stage.


A system is needed that can select sentences that are relevant to a query or a topic , compress these sentences to retain only the content that is relevant to the query or topic, and produce single- or multi-document summaries.

The novel contribution is an apparatus and method to perform multi -document query-focused summarization of sentences by performing , for example, query-focused sentence compression. Given a topic of interest and a collection of documents , the method summarizes the documents or passages relevant to the topic by specifically accounting for the topic. The main reason for this approach is to ensure that summarization disregards text that is not relevant to the topic while retaining information that pertains to the topic itself. An example of this task is to produce natural language summaries of the results of a search by considering the query .

Given a query such as, "Find statements by Company X executives ", and a collection of documents, such as is available on the world-wide-web, the method:

1. Finds candidate passages that are relevant to the query, using a combination of search technologi...