Browse Prior Art Database

Dynamic Surrogate Key Generation Method in DWH using Hexavigesimal numeral system

IP.com Disclosure Number: IPCOM000243776D
Publication Date: 2015-Oct-16
Document File: 6 page(s) / 277K

Publishing Venue

The IP.com Prior Art Database

Abstract

This paper aimed to provide solution of independant data loading in fact and dimension in a typical warehouse system .Dynamic Surrogate Key generation method based on Hexavigesimal numeral system and it liverges data loading process to load fact and dimension independently from source and hence reduce E2E data latency period and dependency.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 6

Dynamic Surrogate Key Generation Method in DWH using Hexavigesimal numeral system

Surrogate keys are maintained within the data warehouse instead of natural keys of source data systems. In the ETL (Extraction-Transformation-Loading) process there should be some data transformation logic to check and replace every incoming natural key with a surrogate key each time before any record brought into the data warehouse environment. In most of the Data warehousing loading method, first dimensions are loaded first and then facts are loaded referring pre-populated surrogate keys from the dimension..

This paper addresses a technical approach on how to generate surrogate key in a dynamic and independent manner so that fact load does not need to wait till loading of all the dimensions is completed and hence the data warehouse load time cane be reduced.

The main purpose is to implement any such method which will take the natural key value as an input and generate unique surrogate key as integer value for dimension as well fact independently. So parallel loading of facts and dimensions can be achieved in the data warehouse system.

Conventional Surrogate Key Generation Method

There several way to generate Surrogate key in ETL process. Those are described below:

Using Sequence number:

Mostly used in data warehouse ETL process. It is easy to implement and easy to maintain as well.
e.g. Oracle Database - Oracle sequence number. Oracle Sequence number is created for each dimension. For any insertion, sequence

.next_val is inserted in the dimension table and is used as surrogate key.

Teradata Database - In Teradata database we can use csum() or rank to generate unique integer for each row.

Using Unique Identifier:

Mostly used in Transactional (OLTP) system. System generated unique value like row ID (in Oracle Database) or current timestamp is used to generate surrogate key(integer) to uniquely indentify each row.

1


Page 02 of 6

Figure 1

As shown in above figure, all facts need to wait till the dimension load is done. This might be a drawback in the datawarehouse system as it will increase the data warehouse E2E load time.

Using Oracle ORA_HASH:

Here key is generated for a given expression ie given natural key and for same expression same key is being generated. So dynamically using this ora_hash in dimension as well as in fact surrogate key can be generated

Disadvantages: Max size of ora_hash has is 32 bit , which means there would be probability of 50% hash collision for 2^16 different random values .

2


Page 03 of 6

Dynamic Surrogate Key Generation Method

Generating Hash Function on run-time:

This method is totally based on number conversion methodology. But the base of the number must be either 26 or 36.

Base-26 number system , which is also known as Hexavigesimal numeral system, may be represented using English alphabet i.e. [A-Z] or [a-z]. Here 0 is represented by A, 1 = B,...