Browse Prior Art Database

Parallel alternative database queries on multi-node big data stores - first to complete wins

IP.com Disclosure Number: IPCOM000254562D
Publication Date: 2018-Jul-11
Document File: 3 page(s) / 128K

Publishing Venue

The IP.com Prior Art Database

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 54% of the total text.

Disclosed is a system for optimizing query performance for NoSQL clustered

databases by introducing design level query parallelism. It can be applied to

any distributed NoSQL database (e.g. Apache Cassandra) with replication factor larger

than 1 and nodes number larger than 1.

Typical use case is as follows: there is a query A that performs well (with

acceptable performance) for some data (when executed on some database content

structure), but poorly (with unacceptable performance) for different data. At the same

time there exists alternative query B, that corresponds to the query A in a sense that

both queries are returning the same results for every possible database content

structure, but a different design of the query B leads to following behavioral differences:

it performs well for all or some data for which query A performs poorly and performs

poorly for all or some data for which query A performs well. Every time we are to

execute the query, we want to achieve best possible performance, ideally would be to

take the version of the query (A or B) that performs best for the current database

content. Query is executed by an application on database that can have various content

structures, so it cannot be easily predicted in advance which query would be the best at

a given time.

The solution is to always execute both versions of the query: A & B in parallel.

The result will be taken from the query that executes first, the other query will be then

canceled, see figure.

Described solution has the following benefits: always the best query time performance is

achieved, due to data replication and clustering, both queries can be executed on

different data replicas and different cluster nodes, not hampering each other

performance. Generally, there can be more than two versions of the query e.g. A1, A2,

... An. Every version of the query must perform well for some data for which none of the

other versions can. Replication factor of the...