An Efficient Method for Supporting Cached Result Sets within a Relational Database
Original Publication Date: 2002-Mar-25
Included in the Prior Art Database: 2003-Jun-20
An approach is disclosed by which result sets for queries in a relational database system (RDBMS) can be efficiently cached while maintaining the integrity and correctness of the individual cached result sets. This approach requires no extra overhead on general processing within the system, works with all semantic environments used by queries, and works within all execution environments including a "shared nothing" implementation of a partitioned RDBMS. This approach achieves these objectives by defining a new relational table attribute which indicates whether result sets referencing a table, or tables, may be cached and by moving the cost of maintaining the correctness of any cached result sets into those IUD statements that affect tables referenced by cached result sets. The caching of the individual result sets created to answer simple and repetitive queries allows for a significant performance boost for an RDBMS through the elimination of the execution phase for these same queries. If a query can be answered by a cached result set, significant processing and I/O cost can be avoided. Once a new query request is correctly matched to a given cached result set, which can be done by an extension of previously described inventions which deal with the sharing of cached SQL statements between heterogeneous applications, the main obstacle remaining is preserving the integrity of the result set over time such that the result set continues to correctly answer the query even as changes are possibly made to the data. This latter requirement means that every INSERT, UPDATE, or DELETE (IUD) SQL statement which affects the tables involved in producing a cached result set must cause the invalidation of that result set. Without knowing which tables have been referenced by cached results sets, this invalidation must be done as part of every IUD statement regardless of whether there are cached result sets affected or not. This imposes a performance penalty on general IUD statement processing as well as overall system processing. Also, in a "shared nothing" partitioned RDBMS implementation, the results sets may not even be cached on the database partition where the IUD statement interacts with a table; this raises the issue of communicating the effects of the IUD statement to all database partitions in order to ensure that all affected cached result sets are correctly invalidated on each and every partition. This too increases the cost of processing each IUD statement and decreases overall system performance. The approach being disclosed solves the problems described above by introducing a new modifiable attribute for a relational table which conveys the amount of data change allowed on that table and whether or not result sets referencing the table may be cached. The values of this new attribute for each new table are determined by the user. For the purpose of this discussion, three values for this new attribute are defined and used: READ-ONLY, MOSTLY-READ-ONLY and MODIFIABLE.