Browse Prior Art Database

Query Processing in a Symmetric Parallel Environment

IP.com Disclosure Number: IPCOM000128201D
Original Publication Date: 1986-Dec-31
Included in the Prior Art Database: 2005-Sep-15
Document File: 18 page(s) / 53K

Publishing Venue

Software Patent Institute

Related People

Dennis Shasha: AUTHOR [+3]

Abstract

We consider a database machine consisting of n nodes connected by an O(n*processing speed) bandwidth network. Each node consists of a processor, a random access memory, and a slower but much larger memory such as a disk. In order to approach optimal (O(n)) speedup on this hardware architecture,'we partition relations roughly evenly among the processors. We study the problem of optimizing multi-join queries assuming such a data distribution strategy. For us, optimization consists of minimizing the number ,of times we need to redistribute relations in the course of a query. The optimization problem is NP-complete for general queries but linear time for a subclass of tree queries.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 7% of the total text.

Page 1 of 18

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

Query Processing in a Symmetric Parallel Environment

by Dennis Shasha

Technical Report #197 Ultracomputer Note #95 January, 1986 New York University Dept. of Computer Science Courant Institute of Mathematical Sciences 251 Mercer Street New York, New York 10012

This work has been supported in part by National Science Foundation Grant DCR-85-OI6I1 and by Office of Naval Research Grant NOOOI4-85-K-0046.

ABSTRACT

We consider a database machine consisting of n nodes connected by an O(n*processing speed) bandwidth network. Each node consists of a processor, a random access memory, and a slower but much larger memory such as a disk. In order to approach optimal (O(n)) speedup on this hardware architecture,'we partition relations roughly evenly among the processors. We study the problem of optimizing multi-join queries assuming such a data distribution strategy. For us, optimization consists of minimizing the number ,of times we need to redistribute relations in the course of a query. The optimization problem is NP-complete for general queries but linear time for a subclass of tree queries.

1. Setting and Problem

The goal of the New York University Ultrabase project is to discover architectures and algorithms that speed up relational query processing roughly linearly with the number of processors.' Fortunately for our choice of project title, we choose a symmetric parallel architecture (figure 1), generalizing the architecture of the New York University Ultracomputer [GGKARS83]. In this architecture, each processing node consists of a processor, a random access memory, and a slower but much larger memory. They are connected by a network whose bandwidth is proportional to the number of processors (That is, the network can sustain a communication load in which each processing node sends a message to an arbitrary processing node every c cycles, on the average, for some c unrelated to n.) We justify our choice by an intuitive symmetry argument. To achieve an optimal speedup of, say, a join of R and S with the join clause R.A = S.B, each of the n processing nodes of our database machine should have about the same amount of work to do. This suggests that each processing node should contain approximately 1ln of the R and S tuples and should perform the join on those. In order for the result of the join to be correct, no tuple of R in processing node i should have the same A value as the B value of some S tuple in another processing node j.2 This implies that it may sometimes be necessary to send tuples from one processing node to another. For example, suppose the R tuples are partitioned on A (i.e. all R tuples with the same A value are in the same processing node), but the S tuples are not partitioned

1 A secondary goal is to achieve this speedup without sacrificing fast execution of simple transactions or concurrency control. ,We will deal with these aspects in subsequent papers....