Browse Prior Art Database

Distributing a Database for Parallel Processing is NP-hard

IP.com Disclosure Number: IPCOM000128576D
Original Publication Date: 1983-Dec-31
Included in the Prior Art Database: 2005-Sep-16
Document File: 6 page(s) / 22K

Publishing Venue

Software Patent Institute

Related People

H.C. Du: AUTHOR [+3]

Abstract

In a database the response time to a query can be reduced if certain con-current operations are provided. In order to maximize the degree of roncurrency, data has to be carefully allocated. The complexities of the following two problems we studied in this paper : one is to allocate a database onto a multiple-disk system which maxim-izes the disk access concurrency; the other one is to allocate a database overa network which minimize the number of nodes being used and a complete parallel searching is possible far a set of queries. Both are shown to be NP-hard (computational intractable) problems.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 26% of the total text.

Page 1 of 6

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

Distributing a Database for Parallel Processing is NP-hard

H.C. Du

Computer Science Department Institute of Technology

136 Lind Hall University of Minnesota Minneapolis, Minnesota 55455

0 42 S)

Technical Report 83-9 May Distributing a Database for Parallel Processing is NP-hard

H.C. Du

Department of Computer Science University of Minnesota Minneapolis. Minnesota 55455

ABSTRACT:

In a database the response time to a query can be reduced if certain con-current operations are provided. In order to maximize the degree of roncurrency, data has to be carefully allocated. The complexities of the following two problems we studied in this paper : one is to allocate a database onto a multiple-disk system which maxim-izes the disk access concurrency; the other one is to allocate a database overa network which minimize the number of nodes being used and a complete parallel searching is possible far a set of queries. Both are shown to be NP-hard (computational intractable) problems.

Keywords: database, parallel Processing, NP-hard

A database (file) F is a collection of records and a record r is as ordered k-tuple

(Equation Omitted)

Far each

(Equation Omitted)

where Di is the i-th attribute domain. There-fore, a database F is a subset of .

(Equation Omitted)

. A retrieval request performed an the database is called a query. In order to respond to a query, a set of records (relevant records) from possibly nillions of records needs to be accessed. A directory, which is a set of indexes, is usualy constructed to help identify those relevant records. The time required to access allrelevaat records depends on the data allocation. By data allocation we mean how all records are stored an storage devices. For example, if a file is large (this is a usual case) it must be stored on a secondary storage device such as a magnetic disk unit. In such a case, due to the physical limita-tion of ,a disk unit, the time required. to access all relevant records can be measured in terms of the number of disk accesses issued. The number

University of Minnesota Page 1 Dec 31, 1983

Page 2 of 6

Distributing a Database for Parallel Processing is NP-hard

of disk accesses must be issued is equivalent to the number of buckets (pages) which contain at least one relevant record, if the directory access time is ignored.

>2

In the past two decades, most effort related to designing an efficient file structure has been devoted to constructing an efficient directory and little attention has been paid to data allocation. However, there are same exceptional cases. In the previous example in order to. minimize the time required to access all relevant records, similar records should be partitioned into the same buckets to minimize the number of buck-ets which contain at least one relevant record. By similar records we mean records that have high probability of being accessed together in response to a query. Another approac...