Browse Prior Art Database

Function, Method and System to Mine Frequent Instruction Sequences from Hardware Sample Data

IP.com Disclosure Number: IPCOM000194715D
Publication Date: 2010-Apr-09
Document File: 4 page(s) / 29K

Publishing Venue

The IP.com Prior Art Database

Abstract

Mining frequent instruction sequences from hardware sample data is to find out which instruction sequences appear at a lot of addresses and have significant ticks (or other events, i.e. icdirms, dcdirms and etc.) to locate optimization opportunities and identify workload characteristics. Above function is very important due to its following applications: Identify performance bottlenecks: Which sequences are hot? Identify hidden reasons for performance events: Which sequences tend to cause ICDirMs (DCDirMs, TLBMs, …) ? Identify workload characteristics: Is there a lot of Java (DB2, network, Disk I/O ) sequences in the workload? So that we can use such information to do capacity planning.Such information could also be used to detect virus. As to our knowledge, no existing tools can provide above function. In addition, although existing data mining algorithms like Apriori[1], FP-tree[2], CloSpan[3], Spade[4], and etc. can be applied because they are generic, those algorithms are very inefficient for our specific problems, due to following reasons: We need to take multiple counters into consideration as well as counts We need to take branches into consideration Hardware sample data is not a natural sequence library Large instruction set size and sample data size, how to speed the processing for this specific problem? References include: [1] R. Agrawal, R. Srikant. Mining sequential patterns. In Proc. 1995 Int. Conf. Data Engineering, pages 3-14, Taipei, Taiwan, Mar 1995. [2] R. Agarwal, C. Aggarwal, and V.V.V. Prasad. Depth-first generation of large itemsets for association rules. In IBM Technical Report RC21538, July 1999. [3] M. Zaki. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, 40:31-60, 2001. [4] X. Yan, J. Han, and R. Afshar. CloSpan: Mining closed sequential patterns in large datasets. In Proc. 2003 SIAM Int.Conf. on Data Mining, May 2003.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 4

Function, Method and System to Mine Frequent Instruction Sequences from Hardware Sample Data

Design and Implementation (1/4)

Input


SET0 : a set of records (sid, seq, opcode, address, targetAddress, ticks)
minSupport
Output
ResultSET: a set of result records ( opcode sequence, totalTicks, count )
Algorithm
Step 1. Get all 1-len frequent sequences, using following steps:
• Scan SET0 once, partition SET0 into a group of subsets, with each subset corresponding with an opcode;

SET1 = { }, for each subset of SET0 , if its size > minSupport, add (opcode, totalTicks, subset size) into ResultSET, and add all records in the subset into SET1

SET1

IA_table

DISASSM_table

SID

Address

SID

Address

SEQ OPCODE TargetAddress …

SID

Address

SEQ OPCODE

TargetAddress …

SID

Address

SEQ OPCODE

TargetAddress …

L

LEN1_table (SET0)

ST

SID

Address

SEQ OPCODE

TargetAddress ……

SID

Address

SEQ OPCODE

TargetAddress …

TM

1

Page 2 of 4

Design and Implementation (2/4)

Input


SET0 : a set of records (sid, seq, opcode, address, targetAddress, ticks)
minSupport
Output
ResultSET: a set of result records ( opcode sequence, totalTicks, count )
Algorithm
Step 2. Get all 2-len frequent sequences, using following steps:
tempSET = SELECT A.*, B.* FROM SET1 A JOIN SET1 B ON (A. sid = B. sid AND A.

seq = B. seq-1) OR ( A. targetAddress = B. address)

• scan tempSET once, partition tempSET into a group of subsets, with each subset corresponding with a 2-len opcode sequence: "opcode, opcode1";

SET2 = { }, for each subset of tempSET , if its size > minSupport, add ("opcode, opcode1", totalTicks, subset size) into ResultSET, and add all records in the subset into SET2

TargetAddress ……

(SET1) (SET1)

Address

SID TargetAddress ……

SEQ...