Method and System for Generating Time-Series Knowledge Base
Original Publication Date: 2001-Jun-27
Included in the Prior Art Database: 2003-Jun-20
A search engine for a large collection of documents (such as the world-wide-web) typically receives a query from a user and returns a sorted list of universal resource locators (URLs) of documents that are judged to respond well to the user's query. The search engine uses copies of documents, which it has downloaded from web sites and has already indexed at various times. This downloading operation is executed by a "crawler" that visits web sites at frequencies, which depend on the rate of change of the content of the site and the estimated importance of the site. Very often, the search engine returns a URL that points to a document that currently is different from the copy that has been stored by the search engine and, in particular, the link may be even be dead, so no document exists with that URL. On the other hand, the document currently kept in the search engine cache may be valuable, in the sense that it may contain current data but, on the other hand, data that was previously displayed at the same URL is lost. If the user is interested only in the current data, then there is no problem at all. However, the user may be interested in past data in order to understand the trend and perhaps be able to forecast future changes. The user may even be simply interested in understanding the change in information. The latter problem can be explained by the following example. Certain consumer products are being auctioned daily at various web sites. Most items on these sites are auctioned once a day and there are only a few closing times each day. A user is interested in buying an item, and while the auction is open, the user can see the current bids. These bids do not reveal what the final price may be, but since the same item is auctioned every day, it should be useful to know what were the closing prices of the item on previous days. These closing prices are not posted except for a very short period of time after the end of the auction each day. Such prices may, first of all, give the user an idea about the possible range of the final price. Thus, if previous closing prices were $850, $920, $910, $975, $890, the user can infer that even though the currently winning price may be $750, it is unlikely that the item could be bought for less than $800. Furthermore, the buyer may detect useful patterns in the price variability. For example, the closing prices on the weekend may be typically higher than on weekdays, or the prices may be decreasing, and so on. There are business-to-business auction sites similar to the business-to-consumer auction sites described above. A "time-series" is a sequence of pairs P(i) (T(i),D(i)), (i=1,2,...) where T(i) is the time of the i-th pair and D(i) is the data at time T(i). Often, T(i) follows a regular pattern, i.e., the data be collected every fixed amount of time such as every day or every hour. In this case the sequence of D(i)'s suffices. Time-series analysis is an established discipline within statistics and electrical engineering, and software packages are available that carry out various time-series analyses.