Browse Prior Art Database

Method of using dynamic time scaling to accelerate data population of big data

IP.com Disclosure Number: IPCOM000241510D
Publication Date: 2015-May-08
Document File: 3 page(s) / 41K

Publishing Venue

The IP.com Prior Art Database

Abstract

In performance test, we usually need prepare massive test data. For example, we need test data for a year. The old ways, such as writing SQL to interact with DB directly or writing code to call API, are hard to get data both quickly and equally distributed. With this invention, by changing system time with a time factor, the test could be accelerated, and at the same time the test data could be distributed equally by time.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 3

Method of using dynamic time scaling to accelerate data population of big data

To do the performance test for an application with big data, we need to do data population beforehand which will prepare massive meaningful test data. The major challenges are how to produce massive meaningful data in a short time and how to keep the data meaningful as the time goes on.

People can write SQL to generate data with DB directly. It can produce data in a short time, but it will be very difficult or impossible if the application is complex. In fact, most enterprise applications are too complex to use this way, since there may have compliated relationships between the tables. And there will be multiple places to store the data (RDBMS, NoSQL data store, Lucene index..etc).

More applicable way is to use the application API to do the population, but this may also take much time to write the population tool. People who are developing the population tool should aslo be familiar with the application logic as well as the APIs. The challenge is in most projects the poeple who need to write population tool (usually testers) are different group of people who write the application API (usually developers).

There is another common challenge to maintain the data. As the time goes on, the data will become obsolete and we have to update data with up-to-date timestamp. This cost more efforts to maintain the meaningful data.

In this disclosure, we will introduce a methodology, which could help people to produce massive meaningful data in a short time. Later, we could still use this methodology to generate more data and keep the data valid. With this methodology, even people who are not familiar with the application, they can still make it.

A method with dynamic time scaling to accelerate data population of big data is provided.

We need a time factor

       time factorto be applied to the machine under test, the machine appling the time factor will run faster than usual. Say if the factor is 10, then the machine clock will run 10 hours in just 1 hour real time. If the factor is 365, then it takes 1 day real time to make the machine clock to run 365 days.

Then we run the measurement that simulate user actions against this machine. Since the timestamp of the application are generated according to the system time of the machine, we are able to generate data span for 365 days with only 1 day's measurement.

This method won't bring in dirty data. We produce all data by accessing the application directly not by changing the DB data. And we don't need to change the timestamp in various storages since they are generated according to the system timestamp that is scaled .

This method will simplify the data population and data maintenance. Actually we use the same way to do data population and workload measurement. The data population process could be treated as one workload measurement. When the data become old and we need up-to-date data, we run the work...