System and method of forgiving loaded data based on pre-assigned key
Publication Date: 2016-Apr-13
The IP.com Prior Art Database
Loading volumes of data is an initial step in data warehousing environments which is later used for analytics to extract meaningful information. Any worng data inserted can lead to unexpected results and even big efforts to rectify the same. This article gives an easy solution to this problem by assigning a key to each load operation in a one to one mapping for each file. This way it becomes easy to track at the source level and rectify the same using the key assigned.
Page 01 of 2
System and method of forgiving loaded data based on pre -assigned key
User can run load command to load bulk data into database table. A user can also run load commands multiple times to load data from different file to a single table. After sometime, user identified that he/she has loaded some wrong data to a table say TableA and want to undo the unwanted load. But, TableA has data loaded from 10 different files say File1, File2, ... File10 and user want to undo the loaded data from File2 only. There is no command or method to just undo the loaded data from File2. This article is to solve this problem and thereby getting the system to sane state.
We propose a system and method to name the loads - we do have name for load jobs but here by naming we mean naming of loads at target (DBs or dW tables/system, big data, cloud systems). Along with the load job we will identify the data loaded by a particular source - currently we can have a column based on the load jobs from different ETL tools but if data was inserted without a job id, we propose an ID to be created for each and every data coming in.
While performing delete we simply mark the rows deleted and the system should take care of the rest. This article is applicable to all bulk data loading utilities/operations like LOAD/IMPORT/INGEST.
This article uses concept of Internal columns. To use this innovation, database server should get enhanced to use below mechanism:
When a database table gets created, add an internal column say LoadID to the table. It will be an internal column, so that when user executes "select * from tablename", data from internal table will not get selected and returned to the user. Currently, databases already create internal columns for created tables for various reasons. Data in this internal column can not be inserted using INSERT command. Only LOAD, IMPORT or INGEST operation can update this internal column.
When user run a LOAD/IMPORT/INGEST command, server should detect the type of command. If it is LOAD/IMPORT/INGEST command, server should internally generate a unique loadkey using the current time stamp and insert this loadkey in the internal LoadID column for each row being lo...