Browse Prior Art Database

Identifying Test Steps in Large Collections of Tables Disclosure Number: IPCOM000236322D
Publication Date: 2014-Apr-20
Document File: 5 page(s) / 46K

Publishing Venue

The Prior Art Database


A method to find the columns of test steps and expected results in tabular collections of test cases, without being dependent on column titles. The method is based on detecting patterns of text columns, indents, and extent of the column being full.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 39% of the total text.

Page 01 of 5

Identifying Test Steps in Large Collections of Tables

Navigation through large collection of test files is necessary when one organization takes the responsibility of another organization's testing, and also when test experts depart from an organization without passing their knowledge to others.

    In many such cases the files in question are spreadsheets, and it is frequently necessary to decide which spreadsheet describes tests, and which describes other information, such as reviews, summaries, and requirement tracing matrices. Furthermore, it may also be required to identify individual test cases and test steps.

    One of the main challenges of the task is that table formats differ, and column headers in these tables are not uniform, and, in some cases are ambiguous and even outright misleading:

One significant difference in formats is that in some cases there is a line per test case, and test steps are all in the same line, whereas in other cases there is a line for every test step

The column header for test steps is not unique. In some cases it is "Description", in other cases it is "Steps", and yet in others it is "Description/Input" , and there are many more.

An example of ambiguity is the title "Description": sometimes it refers to a test cases, and sometimes to test steps.

An example of misleading header is where "Description" referred to description of steps, "Steps Description" referred to expected result, and a column named "Expected Results" existed but was empty.

In some cases the title row does not exist at all.

    This invention describes a method to analyze a large collection of spreadsheets and identify those that are likely to be test descriptions, without relying on prior knowledge of words used. (In other words, with minimal dependence on language). Furthermore, the main columns in the table are identified without language related set up effort.

    The identified items can be explored using of-the-shelf indexing and search products such as SOLR and Lucene.

The invention has been implemented and tried on a collection of 210 tables. It works flawlessly.

Prior Art
We are not aware of any prior art in this area, and found nothing relevant in searching for "large collections of test cases" and "analyzing large collections of test cases".

The core of the invention is to first analyze the structure of the tables, then extract a crude language model, and finally identify test steps, expected results, test IDs, and test descriptions.

The main steps are:

Process all the files and identify tables whose structure is used almost exclusively in

1. test descriptions

Extract textual information from the identified tables, e.g. :

Words typical to test steps and to test step column headers


Words typical to expected results and to expected result column headers


Typical headers of test steps


Page 02 of 5


Process all the files again, this time using the information from step #2 to refine the results:
Increase confidence in columns...