Browse Prior Art Database

# Distributed scalable method for solving order statistic problem on a shared nothing architecture

IP.com Disclosure Number: IPCOM000249552D
Publication Date: 2017-Mar-03
Document File: 3 page(s) / 37K

## Publishing Venue

The IP.com Prior Art Database

## Abstract

A distributed scalable method for solving the order of a statistic problem in the parallel shared nothing architecture is presented. Parallel shared nothing architecture consists of a set of nodes and a host. Elements of a sequence are uniformly distributed between all the nodes. The host coordinates work of all the nodes e.g. it can aggregate their work.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 43% of the total text.

1

Distributed scalable method for solving order statistic problem on a shared nothing architecture

We present distributed scalable method for solving order statistic problem in parallel shared nothing architecture. Parallel shared nothing architecture consists of a set of nodes and a host. Elements of a sequence are uniformly distributed between all the nodes. The host coordinates work of all the nodes e .g. it can aggregate their work.

In order statistic problem the goal is to find k -th largest element in the sequence consisting of n elements. This problem is well known and there are number of sequential solutions to this problem. One of the simplest solutions is to sort the sequence and return k-th element of the sorted sequence. The index of an element in a sorted sequence is also called rank of an element in a sequence . The complexity (running time) of this solution is O( n log n ). The best known solution requires O(n) comparisons. This is optimal since each element has to be processed . There are distributed deterministic solutions where d processors computes k -th largest element in time O(n/d) provided the sequence is stored in a shared memory that can be accessed by all processors. The distributed method with shared memory could be applied to the shared nothing architecture and we could simulate shared memory however the cost of communication and redistributing the sequence would dominate the cost of algorithm. We present method of computing k-th largest element in the shared nothing architecture that scales almost perfectly with number of working nodes. More precisely, if there are n elements uniformly distributed between d nodes our method requires O ( log d * log n/d ) phases and the total running time is O(n/d * log n/d). Each phase requires O(d) messages.

Prior art: Progress in Selection, Mike Paterson, Lecture Notes in Computer Science Volume 1097, 1996, pp 368-379

http://www.cc.gatech.edu/~mihail/www-6550/progress-in-selection.pdf

Parallel Selection. http://www.umiacs.umd.edu/research/EXPAR/papers/3494/node17.html

We present deterministic scalable method of solving the order statistics problem on the parallel shared nothing architecture. Our method works in almost optimal time O(n/d * log (n/d) ) up to log (n/d) factor and requires O( log (n/d) * log d) phases. In each phase each node sends exactly one message of constant size to the host .

Let us denote by rank(e) position of an element e in a sorted sequence . We are searching for an element e whose rank is rank(e)=k. We assume that n elements are distributed uniformly between all d nodes. Each node holds n/d elements. Initially, each node sorts its elements that takes O ( n/d * log (n/d) ) steps. Our method works in phases coordinated by a host. In each phase we eliminate significant number of elements that do not have chance to be k -th largest element. This way we reduce the size of the problem.

Phase:

2

1) Host requests median from each node.

2) Each node computes its...