Browse Prior Art Database

PROTEUS- Protein databank (PDB) file representation for Structural analysis

IP.com Disclosure Number: IPCOM000010789D
Original Publication Date: 2003-Jan-22
Included in the Prior Art Database: 2003-Jan-22
Document File: 5 page(s) / 42K

Publishing Venue

IBM

Abstract

The Human Genome Project, in its "Quest for the Holy Grail", has thrown up several challenges and interesting revelations in its wake. One of the outstanding and fundamental revelations has been the revision of the central dogma of genetics... While earlier geneticists believed that nature worked along the lines of DNA-->RNA-->Protein, present day genetics has given rise and lent credence to the new dogma, i.e. Sequence-->Structure-->Function. This has opened up relatively new avenues in genetics like "Structural Genomics". Structural genomics involves the determination, analysis, and dissemination of the three-dimensional structures of all protein and RNA molecules in nature, providing new opportunities at the interface of structural biology, functional genomics, and bioinformatics. All the experimentally determined structural information of the proteins is stored in PDB file formats The PDB was established in 1971 with a handful of structures at Brookhaven National Laboratory. Revisions and modifications are now handled by the RCSB consortium (www.rcsb.org). The PDB files form the foundation for any structural processing and analysis; be it three-dimensional structure rendering, structure comparison etc. And it is imperative that this file be parsed every time it has to be used. The PDB file is fairly complicated with a lot of connectivity information between various atoms in the amino acid encoded in specific fields. With Genome scale structural analysis likely to assume critical importance in future, it is absolutely essential that PDB files are represented in a format that is more amenable to tools that may be used for structure rendering, analysis etc. PROTEUS is a radically simple data structure that holds all the amino acid information and the atom connectivity information. In addition, it has an associated set of algorithms that convert the information from the PDB files into PROTEUS.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 50% of the total text.

Page 1 of 5

PROTEUS- Protein databank (PDB) file representation for Structural analysis

1.1 Introduction

  The Structural Genomics initiative has started evolving into more ambitious dimensions. In addition to archiving results from experiments that generate three-dimensional data of newly discovered proteins, this branch of genetics is now attempting to seek answers hidden in the structure information of proteins. &Thorn;ochAnswers that will unlock some of the deeper mysteries of life, answers that will hopefully provide an explanation of life in terms of the functioning of the various proteins and the role of structures in determining the final function of the proteins.

    The primary focus of Structural Genomics being protein structures, one of the vital elements in the entire process is undoubtedly the PDB files. PDB files contain all the information that is required to define the three dimensional structure of proteins. For more detailed information on PDB file format look up Reference 1.

1.2 Motivation for PROTEUS

   It is undeniable that PDB files are of paramount importance in Structural Genomics. A few relevant issues are highlighted here:

  The PDB file in itself is tough to use in computer-aided manipulation, rendering and analysis.

  Parsing the file every time it has to be used or every time it changes is time consuming and tough.

  There is no standard intermediate data structure that holds the parsed information for easy manipulation.

  There is no simple technique by which connectivity between different atoms can be traced.

  Looking at the structure of pdb files gives one a fair idea of the complexity of the encoded atom connection information in the pdb files. There must be a simple structure that holds this information rather than deciphering this code every time it has to be used.

1.3 What is PROTEUS?

A high level description of PROTEUS appears in Figure 1.

1

Page 2 of 5

Figure 1 - PROTEUS: A birds eye view

The PROTEUS system consists of two parts:

~ The PROTEUS data structure and ~ The PROTEUS algorithms to convert the pdb data to PROTEUS data.

1.3.1 PROTEUS data structure

    The PROTEUS data structure is primarily aimed towards representing the residue level information in Residue Connectivity Graph (RCG). The RCGs can be trivially connected to obtain the protein level structure as shown in Figure 1, where the residues R1, R2, R3 and R4 are connected. Also shown is the conceptual RCG of the R4 residue. The implementation of the RCG is shown in Figure 2.

    Each RCG is modeled as a graph of AtomConnect units, which is made up of two sub-structures:
~ Atom structure
~ Connect structure

    The Atom structure as shown Figure 2 is a 16 fiel...