Browse Prior Art Database

DNA and Amino Acid Tetragrams

IP.com Disclosure Number: IPCOM000120054D
Original Publication Date: 1991-Mar-01
Included in the Prior Art Database: 2005-Apr-02
Document File: 2 page(s) / 78K

Publishing Venue

IBM

Related People

Pickover, CA: AUTHOR

Abstract

Disclosed is a graphical representation and method for representing information-containing sequences in biology. In particular, the procedure takes DNA or protein sequences containing n subunits and computes n three-dimensional real vectors. When displayed on connected tetrahedra, these characteristic patterns appear as DNA or amino-acid tetragrams. Experiments indicate that these tetragrams are sensitive to certain important patterns in the sequence of bases and allow the human observer to visually detect some important sequence properties.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

DNA and Amino Acid Tetragrams

      Disclosed is a graphical representation and method for
representing information-containing sequences in biology. In
particular, the procedure takes DNA or protein sequences containing n
subunits and computes n three-dimensional real vectors.  When
displayed on connected tetrahedra, these characteristic patterns
appear as DNA or amino-acid tetragrams.  Experiments indicate that
these tetragrams are sensitive to certain important patterns in the
sequence of bases and allow the human observer to visually detect
some important sequence properties.

      As further background and explanation, DNA is usually
represented as a string of characters (e.g., G,C,A,T) so that the
human observer may find difficulty in distinguishing between
different sequences, assessing base composition, and finding various
patterns.  A technique which has proved useful in overcoming this
drawback involves the transformation of the letter strings into
characteristic three-dimensional (3-D) patterns traced out on
connected tetrahedra.  A computer inspects the DNA sequence one
character at a time and assigns a direction of movement. Therefore,
each letter causes a vector to be drawn from a point in the center of
a tetrahedron to one of the four points immediately adjacent.  This
procedure is repeated, and therefore a pattern characteristic of the
DNA sequence is drawn in 3-D space.  A small colored sphere may be
used to represent each subunit.

      The tetragrams which have been computed using a program in C
may be enhanced by several graphical considerations:

      1.   It is suggested that a transparent or opaque sphere be
placed at the origin of the tetragram to indicate how far the
sequence would be expected to travel by chance alone.  This gives a
rough idea of how non-random the sequence is.

      2.   A colored tetrahedral axis may be placed at the origin to
help orient the viewer.  The sche...