Browse Prior Art Database

Method and System for Automatically Determining Hierarchies/Relationships from a Data Set for Generating Data Model

IP.com Disclosure Number: IPCOM000202401D
Publication Date: 2010-Dec-15
Document File: 4 page(s) / 108K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method and system for automatically determining hierarchies/relationships from a data set for generating data model. The method analyzes warehouse objects and generates a relationship mode which identifies relationship between various columns within a single table.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 56% of the total text.

Page 01 of 4

Method and System for Automatically Determining Hierarchies /Relationships from a Data Set for Generating Data Model

A method and system is disclosed for automatically determining hierarchies/relationships from a data set for generating data model. The method analyzes warehouse objects and generates a relationship mode which identifies relationship between various columns within a single table.

In accordance with the method disclosed herein an algorithm is executed for automatically detecting hierarchies in a table. The algorithm works on an idea of scanning each cell in the table to build a data structure that will hold frequency of occurrence of each cell value in a respective column. Using this knowledge of frequencies, groups of columns are formed. Thereafter, max count for each such column group is taken out. Placing them in descending order of their group frequency count, gives the hierarchies present in the table.

The algorithm takes input as warehouse object and performs following steps for identifying hierarchies present in the table.

Step1:

Scan the column in a given table and count number of occurrences of every cell value in that column. This information is stored in following form.
{ cell value, column name, number of occurrences

}

Step 2:

Step1 is repeated for every column in the table.

Step 3:

The cell value data is grouped by column names.

Step 4:

Using above data, for every column group select number of occurrence which is max for that particular column group.

For example:
For below sample data of group 'Col1',

Cell Value Column Name Number of occurrences

Data1 Col1 12

Data2 Col1 02
Data3 Col1 09
Data4 Col1 18
Data5 Col1 12

Here, the max value would be '18' for the column group 'Col1'.

Steps 5:

1


Page 02 of 4

These columns are arranged in descending order as per max value. This order gives hierarchy / Relationships between the data.

Consider an exemplary scenario wherein warehouse data is as shown in table 1. The table 1 contains employee data.

Emp

_Name

Manager

_Name

Vaibhav Nilesh

 Sachin Nilesh Jeremiah Nilesh

 Nilesh Abhay Avinash Abhay

 Jim Abhay Abhijit Abhay

Table 1

As per Step 1 and Step 2 of the algorithm, information for every cell is stored in following form.
{ cell value, column name, number of occurrences

}

So, for column Emp

_Name

          Name, 1 Sachin, Emp

              following data is gathered: Vaibhav, Emp

_

        _Name, 1 Nilesh, Emp

         Name, 1 Jeremiah, Emp

_

Name, 1

A

vinash, Emp

_

        _Name, 1 Jim, Emp

Name, 1

Abhiji

_Name, 1

Similarly, for column Manager

t

, Emp

_

_Name

following dat...