Browse Prior Art Database

Answer Scoring based on Optimal Window Size Disclosure Number: IPCOM000013185D
Original Publication Date: 2000-Mar-01
Included in the Prior Art Database: 2003-Jun-17

Publishing Venue



Disclosed is a system and method of determining optimally sized scoring windows for a ques- tion answering system. In question answering systems that score windows of sentences in a document collection based on their likelihood of containing the answer to the question, using a fixed window size determined a priori produces sub-optimal results. Instead, the window size should be determined dynamically during query processing on a window by window basis in order to identify the best window of sentences for answering the question. The present invention is a technique for answer scoring based on optimal window size, which overcomes the limitations of predefined fixed window sizes and produces better answer results. The present invention solves the optimal window size problem by using a window scoring procedure that factors in window size and searches the document collection in such a way that optimally sized windows are quickly identified and scored. In any given document with n sen- tences, there are n(n+1)/2 different windows, where a window is a contiguous sequence of 1 to n sentences. Thus, to find the optimal window in the document, all of the windows could be scored in O(n^2) time and the best scoring window selected in constant time. The window scoring procedure proposed in the current invention offers a better average case solution. This solution is made possible by the particular window scoring function used in the current invention. The window scoring function is of the form S(w) T(w) (w), where T(w) is a function of the weighted query terms that appear in window w, and L(w) is a function of the length of window w. T(w) increases the window score as more distinct query terms appear in the window. It is a weighted binary (or combination match) function. The number of times a term occurs within the window is irrelevant, rather the important consider- ation is whether or not the term occurs at all. This binary score may be modified by a weight associated with the term in the query. L(w) is a length penalty that reduces the window score as the window size grows. It is typically a quadratic or exponential function of window length, so that as the window size grows much beyond 3, the length penalty rapidly increases. The effect of this scoring function is that as a window grows to include more query terms, eventually the additional length penalty will exceed the potential additional gain from finding more terms, and the optimal window size will have been found. More specifically, let dL(w) be the change in length penalty that would be incurred by growing window w by 1, and let dT(w) be the maximum increase in term score that could be realized by growing window w by 1 (i.e., if all of the remaining query terms not currently in w were to be found in the 1 new sentence to be added to w). Window w has "created" if dL(w) dT(w). The following scoring procedure results: