Browse Prior Art Database

Distribution Of Correlated Streams Of Data In A Massively Parallel Dataflow Graph

IP.com Disclosure Number: IPCOM000239625D
Publication Date: 2014-Nov-20
Document File: 3 page(s) / 29K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method and system is disclosed for distribution of correlated streams of data in a massively parallel dataflow graph.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 39% of the total text.

Page 01 of 3

Distribution Of Correlated Streams Of Data In A Massively Parallel Dataflow Graph

In a column-oriented dataflow graph, several dataflow edges may be used to carry a single record stream - one edge per column or column component. If a group of edges

which carries a single record stream is called 'congruent', then collectively these edges

form a 'congruent group'. In a parallel implementation, there will be several congruent

groups, one group per compute node or unit of parallelism.

Several common operations require routing of record streams between compute nodes. For example, the data from one node may be scattered to the others, or the data from all nodes may be gathered at one. The most taxing operation does both in a single step, re-distributing data from all nodes to all nodes, requiring both a scatter and a gather at each node.

To scatter a congruent group, a demux selector (however generated) is applied to each edge in the group. Data may be skewed toward one destination, but this has no effect on the scattering operation itself.

To gather multiple congruent groups (the opposite of scatter), a mux selector is required to be applied uniformly to all edges in a congruent group. Here, a data skew may mean that all data is arriving from a single source, so it is not sufficient to make (for example) a round-robin selection. In an initial solution disclosed by the method and system disclosed herein, wherein the solution can be used for a simple gather (as opposed to a scatter+gather redistribute), one edge is identified for each congruent group as a readiness indicator. When that edge is ready, data is accepted simultaneously from all edges in that group (or blocked until data is available).

The initial solution has issues when applied to re-distribution. In this, a dataflow engine requires that each edge connects one source to one destination. To use an edge as a readiness indicator and as data to be muxed, the data on the edge must be (notionally) duplicated using a clone node in the graph. The addition of this node has a cost.

Routing edges over the network also has a cost in management of network endpoints. So another issue concerns reducing the number of edges which travel from one compute node to another. The system already muxes the edges for a congruent group together when traversing the network, but further disclosed herein is a way to reduce the number further.

To reduce the number of edges routed over the network, the gather step of re-distribution is split into a two-phase gather, first phase occurs at the source compute node. This impacts the remainder of the solution.

To remove the requirement of adding clone nodes in order to use a data edge as a readiness indicator, the method introduces one sacrificial data stream for each level of gather, which is consumed as a readiness indicator. This dramatically reduces the

1


Page 02 of 3

number of nodes in the graph.

Suppose there are S compute nodes, each partitioned into D...