Meta-data provision for a Life Science repository through federation of local databases
Original Publication Date: 2004-Jun-18
Included in the Prior Art Database: 2004-Jun-18
Disclosed is a mechanism which applies to archives for Life Science data. The mechanism allows to re-use metadata generated on an analysis device for referencing/decribing the archived data assets. It does so through a special kind of metadata federation, where local metadata from different sources are mapped onto a common federated metadata model. This is achieved through a special way of indexing metadata when building such a federated LS archive.
The core idea is to use a federated metadata model for the description of assets in a central LS archive, thus leveraging federation over local analysis workstations for metadata supply. This means to set up a federation and apply a data loading algorithm so that results from querying this federation describe the proper assets stored centrally.
A LS federated archive can be described through the following architecture diagram fig. 1 which shows a real-life example implementation:
Fig. 1 A real-life life science archive system implemented with IBM* DB2* Content Manager and Information Integrator for Content .
Fig. 1 describes the architecture of a life science archive system using descriptive federation as proposed in this publication. The archive system consists of, from bottom to top, a storage area network , a set of relational databases for federated data, and content metadata, and a set of resource managers holding object data, a connector to these two types of servers, an instance of the
-data provision for a Life Science repository through federation of local data provision for a Life Science repository through federation of local
Information Integrator for Content (II4C) product running on the WebSphere * Application Server, and a solution layer for a life science client which directly serves a number of browser-based end user clients. Attached to this core system are a set of databases from local analysis workstations, using the database
connector facilities of II4C, for metadata access to these databases a set of (possibly different) analysis workstations, each delivering data objects to
resource managers, using a cache mechanism.
The core component in this architecture, is the federated data model which can be seen as least upper bound of the data models of the underlying backend servers. This federated data model is used as a meta data description for digital objects imported from various devices through file system interfaces which may be cached. The principle of using a federated conceptual data model (CDM) obtained through partial homomorphisms from the CDMs of various analytical devices or their attached servers, is shown in the following fig.2.
CDM device server 1
Federated CDM Federated mapping
CDM device server 1
In Fig. 2, there are two data models for databases of analysis workstations (red and green), and one federated data model (blue). Concepts or entities from the analysis workstation databases are mapped to federated concepts so that the mapping is a partial homomorphism. In addition, whenever a mapped backend entity points to a binary object, the corresponding federated entity will point to the centrally stored copy of this file. Therefore, federated entities may point to several binary objects on different analysis workstations.
The indexing algorithm de...