Browse Prior Art Database

Sound and Consistent Handling of Nulls in Data Bases

IP.com Disclosure Number: IPCOM000121550D
Original Publication Date: 1991-Sep-01
Included in the Prior Art Database: 2005-Apr-03

Publishing Venue

IBM

Related People

Larner, A: AUTHOR

Abstract

The concept of a "null" field in a data base is ill-defined, and nulls give rise to a number of problems in relational DBMs. A technique is described that provides a definition and leads to solutions.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 12% of the total text.

Sound and Consistent Handling of Nulls in Data Bases

      The concept of a "null" field in a data base is
ill-defined, and nulls give rise to a number of problems in
relational DBMs.  A technique is described that provides a definition
and leads to solutions.

      It is sometimes desired to record a field in a data base as
having no value, or as having an "improper" value (such as "unknown",
or "not yet available"), i.e., as being NULL.   However, underlying
theory does not support such missing or improper values.  Ad hoc
adjustments to the theory, or to DBMSs, to allow missing or improper
values, threaten the soundness and consistency of the user interface.
For example, the current SQL NULL both is ill-defined and requires
the use of a three-valued logic. The user of SQL is consequently
unable to grasp the meaning of a NULL field, or, when such fields are
present, safely to apply the conventional correct truth-valued logic.
The technique disclosed here comprises:
*    A definition of NULL, which is a clear concept and allows a test
to be formulated whereby it may be determined of a putative value
whether it is NULL or not.
*    Extensions of the definitions of all relational operations,
which permit the uniform application of truth-valued logic.
*    A means of expressing and handling different kinds of NULLs.
SUMMARY OF THE TECHNIQUE

      With respect to relational theory:
*    A distinction is drawn between a cell (i.e., a field instance,
or column/row intersection in a table), and the value, if any,
contained in such a cell.
*    The constraint of first normal form (1NF), viz that each cell
contains exactly one value, is relaxed to: each cell contains, at
most, one value.
*    All and only cells with no value are said to be NULL.
*    All operations are defined primarily upon cells, and only
secondarily upon the values in those cells.

      With respect to the interpretation placed by users on their
data:
*    With each column in a table is associated a criterion of
identity, which explicates the meaning of that column, in the sense
that, for a column, say C, any rows with the same value in that
column can be said to have or be the same C.  (For example, two rows
representing Persons and having the same value in the column Salary
would be said to have the same Salary.)
*    A putative value in a row and column is said to be "proper" (and
is accordingly represented by a value in a cell) if and only if it
meets the criterion of identity of the column;  otherwise, it is said
to be "improper" and represented by an empty cell (a cell with no
value, a NULL cell).
*    Where it is necessary to distinguish between different improper
values, or to distinguish different ways (termed modalities) in which
a cell (or its value) in some column, say C, pertains to the meaning
of the column, it is permitted to introduce an associated column, say
C', such that the cell of C' in any row has as valu...