Browse Prior Art Database

A Method of Recommending Workload Configurations for Resource Sensitive Big Data Testing Disclosure Number: IPCOM000245494D
Publication Date: 2016-Mar-12
Document File: 4 page(s) / 184K

Publishing Venue

The Prior Art Database


Disclosed is a method and system for recommending workload configurations for resource sensitive Big Data testing.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 4

A Method of Recommending Workload Configurations for Resource Sensitive Big Data Testing

Cluster operators and Big Data application developers need to analyze and test systems under different resource utilization scenarios. Tuning workloads within benchmarks to stress specific resources is tedious and time consuming. Traditional benchmarks provide profiles with certain resource utilization characteristics, but provide little guidance for tuning parameters of chosen workloads.

The novel contribution is a method to build a workload classifier and search strategy that allows cluster operators and Big Data application developers to identify and recommend resource-sensitive workload configurations based on the user's benchmarking requirement.

The novel method and system automatically generate workload configurations enabling end users to comprehensively and systematically exercise different system resources and other specific needs. The method and system are comprised of a workload profiler that profiles the workloads and collects all related statistics such as input/output/shuffle data set size, the number of stages, the number of Resilient Distributed Datasets (RDDs), RDD size, etc. A workload classifier

classifies a workload configuration to different types of resource utilization patterns. Given a user usage pattern requirement, a workload customizer automatically generates the pattern by searching through the configuration space and classifier. The workload customizer uses a search algorithm to efficiently find the workload configuration.

The architecture of the system includes a workload driver, searcher, profiler, classifier, a set of workloads, and a resource sensitive suite. (Figure 1) A workload driver analyzes the workload execution to automatically generate different versions

(specifications) of the workloads that both represent various usage scenarios and stress different resources in the cluster.

A customizer component searches through a set of classified workloads to identify resource-sensitive workload specifications that stress each resource in the system (e.g., memory, central processing unit (CPU), disk, network, etc.).


Page 02 of 4

Figure 1: System Architecture

The profiler collects workload information such as:

• Type and number of jobs • Stages • Data computation (i.e., map/reduce/shuffle functions)

• Data size for input/shuffle/output • Execution time (i.e., per-workload, per-computation type)

• Workload-specific parameter values (i.e., number of iterations) • Response time
• Accuracy (if applicable: precision, recall)

The parameter selector and workload classifier use the:

• Intermediate data size • Workload parameter configuration (i.e., RDD, cache size, shuffle memory size) • Resource utilization per job, per stage (i.e., memory,...