Browse Prior Art Database

Technique to optimize lookup performance using intelligent generation of the reference (lookup) dataset

IP.com Disclosure Number: IPCOM000236202D
Publication Date: 2014-Apr-11
Document File: 2 page(s) / 28K

Publishing Venue

The IP.com Prior Art Database

Abstract

This article describes a technique to optimize a lookup operation by intelligently generating the lookup datasets. The lookup dataset will be generated (or regenerated) based on its current usage across jobs. As a part of the usage analysis one would look at determining the columns of the dataset that are most frequently read. One would also look at the most frequently used partitioning styles on the input link of the stages consuming the lookup dataset. The frequency distributions that result from the aforementioned usage analysis along with the I/O characteristics of the dataset storage will help decide the columns and partitioning style of the optimized dataset.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 60% of the total text.

Page 01 of 2

Technique to optimize lookup performance using intelligent generation of the reference (lookup) dataset

The following steps describe the technique for optimizing the lookup based on column usage analysis:
1) Scan all jobs to find the consumption of the specified lookup dataset
2) From all matching jobs, find the columns of the lookup dataset that were actually read
3) Generate a frequency distribution of the column sets of the lookup dataset
4) For each column set estimate the benefit of writing it out as a separate lookup dataset. If the frequency of the column set is greater than the disk read/write ratio write it out else not

Eg:


Consider the lookup dataset "lup3.ds"

Originally generated with schema
A:string[50]

B:string[100]

C:string[75]

D:string[25]

Frequency distribution of columns sets from matching jobs:
(A,B) - 5
(B,C) - 3
Disk read/write ratio = 4
Rewrite the lup3(1).ds using columns A,B and write out rest of columns C,D as lup3(2).ds

The above technique reduces the actual data to be read during a lookup operation.

The following steps describe the technique for optimizing the lookup based on partitioning style analysis:
1) Scan all jobs to find the consumption of the specified lookup dataset
2) From all matching jobs, find the partitioning style requested on the input link of the stage consuming the lookup dataset
3) Generate a frequency distribution of the partitioning styles
4) Recreate the lookup dataset using the partitioning style which has the highest frequen...