Browse Prior Art Database

System and method to find the largest and most occuring ordered common set of statements in large datasets optimized on storage and time

IP.com Disclosure Number: IPCOM000243956D
Publication Date: 2015-Nov-02
Document File: 13 page(s) / 67K

Publishing Venue

The IP.com Prior Art Database

Abstract

This algorithim helps to find the largest and most occuring ordered common set of statements in the large amount of datasets.This algorithim optimizes in terms of storage and time.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 18% of the total text.

Page 01 of 13

System and method to find the largest and most occuring ordered common set of statements in large datasets optimized on storage and time

Say for eg. A person wants to find the largest common set of statements per page in a book, or the least common statements per page in the book.

Or take a set of Manual test scripts and user wants to find the most commonly (and least commonly) occurring set of statements such that they can create re-usable keywords out of those.

Taking the example of manual tests scripts ahead


Constraints & assumptions would be : a) The statements in a manual script are ordered (one statement comes after the other). b) Each manual script is uniquely ordered, common statements is not about starting from step 1 but common set of statements could occur in varied places (eg. Script A - steps 3-7 are common with script B- steps 19-23 & Script C steps 71-75)


c) Need to find the largest common to least common set of statements (algo should

progressively provide larger and larger set of common steps).

d) Each script is an individual entity ie. once the script ends, the follow-up cannot be to the next script e) Scripts can contain Rich text information which will be removed before the algorithm is applied on.

f) Algorithm can be tweaked to give 100% match or based on hashed values of statements g) Algorithm can be tweaked to break cycles if any.

h) Easy maintainability once steps change in the manual script (minimal updates needed)

  i) Optimize on database DB space and Time
Algorithms present today are mostly targeted towards Longest common sequence. The help to a certain extend. However due to the constraints of ordering of statements, multiple sets of statements, longest sequencing between each manual script none of the available algorithms can be applied directly to this problem.

This algorithm, here give more optimization method to the earlier proposed algorithm in terms of memory utilization and processing speed. The prior motive of the algorithm was to find the longest common sequence occurring frequently in the dataset. This algorithm is most suitable where scripts are copied and contain large number of common steps.

During the processing, only those set of data is generated which are required to find the longest. Intermediate processing are omitted which in turn help in performance of the algorithm.

Key points which are considered here in relation to the previous algorithm are:

• Memory : Various intermediate data is not formed.

• Speed : In this algorithm for processing jumps are done , where as in earlier one moves are done linearly.

• Operations : This algorithm reduces the number of comparisons done on the data, hence contribute in optimization.

1


Page 02 of 13

In this algorithm we will consider each script's statements as a directed list of nodes as shown in detailed section [1]. While watched individually, they would appear as ordered linked lists however once we start to super-impose the linked lists,...