Browse Prior Art Database

Loading Data In Parallel During LOAD SHRLEVEL NONE Of A Database System

IP.com Disclosure Number: IPCOM000234050D
Publication Date: 2014-Jan-09
Document File: 4 page(s) / 100K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for loading data in parallel during LOAD SHRLEVEL NONE, of a database system. The method assigns multiple subtasks to drive parallelism of the data conversion.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 43% of the total text.

Page 01 of 4

Loading Data In Parallel During LOAD SHRLEVEL NONE Of A Database System

LOAD SHRLEVEL NONE can only drive parallelism to improve elapsed time when the table space is partitioned and there is a separate input data set for each partition. This method of parallelism is limited to classic partitioned table spaces and partition by range table spaces and it is difficult to maintain the separate input data per partition after ALTERing the limitkey or REORG REBALANCE.

Much of the elapsed time for loading data is spent processing the input data and converting the records into the relational database server internal stored format needed for loading into the table space. According to embodiments of the present invention, LOAD SHRLEVEL NONE, assigns multiple subtasks to drive parallelism of the data conversion thereby reducing data conversion elapsed time which may, in some cases, include loading preprocessed data in parallel. Embodiments of the present invention provide the ability to drive parallelism for non-partitioned table spaces.

The following embodiment applies to the mass loading (e.g., LOAD SHRLEVEL NONE) of data into both partitioned and non-partitioned table spaces.

During the beginning of the reload phase of the LOAD SHRELVEL NONE, the optimal number of tasks to complete the load is determined based on memory and Central Processing Unit (CPU) constraints. Under the control of a main process, multiple agents convert tasks and a reload task is started for loading of data in parallel with a single input data set/file. A buffer used to store the input data is assigned to each task.

The input data set is processed by a reader task that reads records from the input data and assigns a buffer of records to each tasks' buffer. Each task then processes the input record, performs all conversions and formatting of the record into the internal stored format, and then passes the record to the RELOAD task for load processing as shown in Figure 1.

The RELOAD task then loads the already formatted record into the table space. Since the records are all in the same table space or partition, only one RELOAD task can load into the same page set at one time. The RELOAD task loads the buffer of converted records from each Convert task. Driving parallelism for the data conversion and formatting helps reduce the total elapsed time of the loading of the data.

1


Page 02 of 4

Figure 1: Design flow with one RELOAD task

To further improve concurrency, each convert task can be assigned two buffers instead of one. With this dual buffering technique, as shown in Figure 2, data conversion can occur on one buffer and formatting can occur on one buffer, while the RELOAD task

can be loading already converted buffers of records. In Figure 2, the data is being converted on the *A (where * indicates the task number) set of buffers, while the *B set of buffers, which were already processed by the convert task, are being loaded by the Reload task.

Figure 2: Design flow w...