Methodology and apparatus for real-time data validation for an online database.
Publication Date: 2010-Oct-20
The IP.com Prior Art Database
Disclosed is a method to validate the content in database tables that are subjected to simultaneous updates and queries in a multi-partition environment. Data validity is evaluated using two tests: one to ensure that a committed update to a row is simultaneously visible to all partitions; and a second to determine if the content in the table or dependent indexes is corrupted.
Page 01 of 4
Methodology and apparatus for real -time data validation for an online database .
Disclosed is a process for enabling data validation to ensure data management functionality of a database product does not introduce data loss or corruption. The disclosed process is sufficiently generic to enable implementation by test departments of database vendors and partners as well as test application vendors. The disclosed process can typically be implemented relatively easily because the disclosed process focuses narrowly on validating core data management function by querying and updating rows using Structured Query Language (SQL) constructs. The disclosed
rocess simulates a production environment by using a variable stress load of updates an
queries as a valuable test to ensure stability, scalability, responsiveness, and throughput, before the product goes into production.
Data integrity is a fundamental concern to all database vendors because lost or corrupted data could put a vendor or a customer out of business. Data integrity is of primary concern in the database industry. Testing for and discovering data corruption early reduces the cost of debugging and fixing the source of the error. Therefore there is a need for real-time data validation.
Using the disclosed process, a test environment consists of a single database distributed across multiple data partitions in which a partition is defined as a portion of the database on a server or a server operating on a portion of the data. Although the application can function in a single
artition configuration, when multiple data partitions are involved the application attempts to distribute operations, namely querying and updating, to distinct partitions. Although distributed the various operations target a subset of data and the data that is updated is also queried for validating.
The test application includes a coordinator that drives simultaneous activity against the database. The coordinator is configured with parameters including the volume of stress, test database to connect with, volume of data in the target tables, and a seed value to initialize a random number generator to. The last parameter enables reproducing an execution sequence in the application. On initialization the application populates the tables then starts simultaneous driver applications, each of which drives a concurrent workload consisting of queries and updates. A driver iterates over a workload until the driver is halted which happens at a first occurrence of a failure to validate or on termination. The workload is distributed into simultaneous worker applications. For each of the iterations the driver rebuilds a mix of workers with a possibly new workload. The coordinator and drivers with associated workers can be addressed as a group to halt the application for investigation.
Each driver is assigned to validate a subset of rows in a table such that no row is common to other drivers. Uniqueness among subsets is...