Browse Prior Art Database

Method to adaptive benchmark of analytics workloads

IP.com Disclosure Number: IPCOM000240195D
Publication Date: 2015-Jan-12
Document File: 4 page(s) / 44K

Publishing Venue

The IP.com Prior Art Database

Abstract

In many scenarios, it is necessary to building a performance model for capacity planning, performance tuning, etc. Building an accurate performance model, however, is a challenging task, because a benchmark plan must be obtained, and executing the benchmark plan requires a great deal of resources. In this article, a mechanism and apparatus is introduced to construct benchmark plans that maximizes the accuracy of the generated performance model, and minimizes the resource requirements. In addition, it selects the optimal benchmark points given the number of actual executions

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 55% of the total text.

Page 01 of 4

Method to adaptive benchmark of analytics workloads


1. Background

Building an accurate performance model is desirablefor many cases, like the Cloud platform. For example, the model can be used to predict workload performance (time, bytes read/written, etc.), given input parameters, like the cluster size, CPU power, etc. To build a performance model, the following steps are generally followed:

1) Construct a benchmark plan. This step generates a list of benchmark points corresponding to executions of real workloads. For each benchmark point, the input parameters are determined.

2) For each benchmark point, execute the workload according to the given input parameters to get the output parameters (the time, I/O, etc.).

3) Based on the input/output parameters obtained in the above 2 steps, build a model to characterize the relations between input and output parameters.

As far as step 2) is concerned, there are generally two ways to obtain the output parameters:


1) Set up a real environment, and run it in the environment. We call this method real-run.

2) Through some short-cut, like some simulator, or historic log miner. We call this method short-cur run.


2. The problem

Of the above 2 methods, the results obtained by method 1) is usually accurate but resource-consuming because it is based on real experiments. In contrast, method 2) requires less resource, but is also inaccurate. Therefore, combining methods 1) and 2) to achieve the optimal tradeoff is desired. Based on these observations, we introduce a mechanism and apparatus to build a performance model as accurate as possible, given users' resource requirements. In particular, we solve the following problems:

1) How to construct a benchmark plan to achieve the optimal tradeoff between model accuracy and resource requirements.


2) How to select the benchmark points for real-runs, so as to maximize the accuracy.

1


Page 02 of 4

3) How to combine results obtained from real and short-cut runs, so as to make use of the advantages of both approaches.


3. Our solution

The overall architecture of our solution is shown in the figure below. The system takes input as user requirements in terms of model accuracy and resource requirements. The performance model is constructed through the following steps:

1) User requirements are directed to the Benchmark Evaluator, which will...