Automatic discovery of types mapping between Presto and federated data source and using this mapping to automatic conversion of test data sets and test workloads
Publication Date: 2019-Mar-14
The IP.com Prior Art Database
Disclosed is a tool for mapping data types between Presto and data sources and converting existing data sets and workloads to Presto standard. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources. In order to, unify data from data heterogeneous data sources some data types must be mapped, for example when select is performed on a table in Cassandra DB and one of the columns types is list, then Presto have to map this column to more common data type, in this case varchar. Finding out all the relations among data types is a tedious and time-consuming process, but it is necessary for testing, for example for designing the expected output from Presto. Also, preparing data sets and workloads to test Presto with plurality of data sources is time consuming and in consequence expensive. To speed up and automate previously mentioned process a special tool could be used. That tool would detect mapping data types between Presto and other data source and apply it on data sets and workloads. This tool could be used with all the data sources supported by Presto such as Hive, Cassandra, MySQL, PostgreSQL or MongoDB. To begin with, some terms introduction: Presto - SQL query engine used to improve performance Data sources - databases, platforms, etc. from which data should be extracted Output database - database where data processed by Presto should be saved It is assumed that workloads and data sets for data sources are already provided, as they are also needed for testing data sources without Presto feder...