Browse Prior Art Database

Converting Unigene datasource to a relational format, enalbe to be loaded/ and accessed from a federated database system.

IP.com Disclosure Number: IPCOM000016290D
Original Publication Date: 2002-Oct-17
Included in the Prior Art Database: 2003-Jun-21
Document File: 4 page(s) / 131K

Publishing Venue

IBM

Abstract

At present Unigene datasource is not accesseble by any relational database engine. By converting the datasource in to a relational model, and creating a relational schema for the Unigene data, now it can be quiried using standard SQL interfaces. Providing a SQL interface capability to this datasource opens up this source to many relational search engines. Also it enables the users to do more complex queries on the Unigene datasource. Implemented as a prototype for IBM Discovery_Link/DB2 product on 05-15-2002. The Schema developed for this relation model is given here. There are thirteen organisms covered so far in Unigene database. The schema differs between the species in some tables. Wherever the differences are they were identified in the parantheses. Unigene Animals: 1. Anopheles gambiae (Aga)

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 56% of the total text.

Page 1 of 4

  Converting Unigene datasource to a relational format, enalbe to be loaded/ and accessed from a federated database system.

    At present Unigene datasource is not accesseble by any relational database engine. By converting the datasource in to a relational model, and creating a relational schema for the Unigene data, now it can be quiried using standard SQL interfaces.

Providing a SQL interface capability to this datasource opens up this source to many relational search engines. Also it enables the users to do more complex queries on the Unigene datasource. Implemented as a prototype for IBM Discovery_Link/DB2 product on 05-15-2002. The Schema developed for this relation model is given here. There are thirteen organisms covered so far in Unigene database. The schema differs between the species in some tables. Wherever the differences are they were identified in the parantheses.

Unigene Animals:

1. Anopheles gambiae (Aga)
2. Bos Taurus (Bt)
3. Danio rerio (Dr)
4. Drosophila melanogaster (Dm)
5. Homo sapiens (Hs)
6. Mus musculus (Mm)
7. Rattus norvegicus (Rn)
8. Xenopus laevis (Xl)

Unigene Plants:

1. Arabidopsis thaliana (At)
2. Hordeum vulgare (Hv)
3. Oryza sativa (Os)
4. Triticum aestivum (Ta)
5. Zea mays (Zm)

1.For all 13 organism's Title table Table name: <organism name>_TITLE (For example AGA_TITLE)

CLUSTER_ID

Varchar(10)

Maps to ID

TITLE

Varchar(700)

Maps to TITLE

SCOUNT

Bigint

Maps to SCOUNT

2. For all 13 organism's Expression table Table name: <organism name>_EXPRESSION (For example AGA_EXPRESSION)

CLUSTER_ID

Varchar(10)

Maps to ID

EXPRESS Varchar(150)

Maps to EXPRESS

3. For all 13 organism's Sequence Information table Table name: <organism name>_ SEQ_INFO (For example AGA_ SEQ_INFO)

1

Page 2 of 4

Schema for Anopheles gambiae (Aga) and Drosophila melanogaster (Dm) SEQ_INFO table

Cluster_ID

Varchar(10) Primary Unique

Maps to ID

ACC

Varchar(30)

Maps to ACC

NID

Varchar(30)

Maps to NID

END

Varchar(10)

Maps to END

LID

Varchar(10)

Maps to LID

Clone

Varchar(100)

Maps to Clone

Schema for Homo sapiens (Hs) and Mus musculus (Mm) SEQ_INFO table

Cluster_ID Varchar
(10)

Primary Unique

Maps to ID

Schema for all other organisms SEQ_INFO table

Cluster_ID Varchar
(10)

Primary Unique

Maps to ID

ACC Varchar
(50)

Maps to ACC

NID Varchar
(50)

Maps to NID

PID Varchar
(50)

Maps to PID

ACC Varchar
(50)

Maps to ACC

NID Varchar
(50)

Maps to NID

PID Varchar
(50)

Maps to PID

Clone Varchar
(100)

Maps to Clone

Clone Varchar
(100)

Maps to Clone

END Varchar
(10)

Maps to END

END Varchar
(10)

Maps to END

LID Varchar
(10)

Maps to LID

MGC...