Optimal way of using score generation and usage by processes in ETL – Data stage SMP
Publication Date: 2010-Sep-21
The IP.com Prior Art Database
Score generation and score usage by all processes generate score and usage by all processes in sequential the drawbacks of generating scores sequentially are 1.Performance overhead on generation and usage of score by all processes (sometimes it take 1 hour) 2.operation happens in sequential 3.score module is not modularized so that it can be used other modules other than Datastage
Optimal way of using score generation and usage by processes in ETL - Data stage SMP
The score in conductor and in section leaders serializes onto the disk. Thereafter the all the
layers will de-serializes it and perform the action on it. By doing this, there is a lot of overhead on serializing and de-serializing the score and not performing this in parallel. It proposes the generation and usage of score by all processes on homogeneous SMP based systems to provide a high degree of concurrency on ETL world. This provides parallelism that multi-
applications can use to scale linearly, and thus deliver high throughput with the shared memory by all processes. The generation and usage of score is optimized that delivers high throughput by minimizing wait latencies in processing in the proposed architecture.
ETL would be used to process large volumes of data and the architecture on one of the ETL tool called data stage of shown below.
Data stage/Process Based Architecture
DSX XML Requests
Figure 1. Data stage architecture
Score - it is a combination of job and configuration file with data connection with many
OSH - orchestrate shell - used to run a job
Orchestrate framework - this is the framework that is being used by all components of Information Server product.
Conversion of this figure (data stage architecture) to more realistic model (with GUI (graphical user interface) and jobmon (
job monitor used to monitor jobs)) is given in next Figure.
Control Channel/TCP Stdout Channel/Pipe Stderr Channel/Pipe APT_Communicator Jobmon Connection
$ osh "generator -schema record(a:int32) [par] | same | copy"
Figure 2. Architecture with peripherals - GUI and Jobmon
Control channel - used to transmit the information from conductor to section leader and vice-versa.
pipe - use to sent logging information that was generated by all lower parts
namely players through the pipe to conductor Stderr channel/
pipe - - use to sent error logging information that was generated by all lower
parts namely players through the pipe to conductor
_Communicator - used to data transfer between players
Jobmon communication - used to send monitoring information to GUI.
Conversion of above diagram to parallel engine (PX) architecture terms is shown in next section.
PX Process Based Architecture
Conversion of above architecture to pure PX model has been isolated and same is shown in below figure.
Figure 3. Architecture of PX - Process relationship
In the above diagram (Figure 3), conductor is used to take the input data from the user (user is data stage GUI for PX); spawns respective processes (s...