Browse Prior Art Database

System for Warehousing Linked Data Based on High-Level Mapping Specification

IP.com Disclosure Number: IPCOM000240398D
Publication Date: 2015-Jan-29
Document File: 5 page(s) / 133K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a system for warehousing linked data based on high-level mapping specification. The core idea is to create a model-driven data-warehousing tool based on a high-level mapping between relational databases (RDB) and Resource Description Frameworks (RDF).

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 38% of the total text.

Page 01 of 5

System for Warehousing Linked Data Based on High-Level Mapping Specification

Linked Data [1] is an ideal data model for domains such as Application Lifecycle Management (ALM). Linked Data is based on World Wide Web Consortium (W3C) Resource Description Framework (RDF) [2]. RDF data can be stored in special purpose databases, known as triples stores [3], which can be queried using the powerful SPARQL Protocol and RDF (SPARQL) query language [4]. Unfortunately, currently available triple stores have far less capacity than mature relational databases. This causes a problem for domains such as ALM, in which the data is frequently updated and it is of interest to understand how the data changes over time (e.g., to determine if the quality of software products is improving release-to-release).

The limited capacity of triple stores leads to an architecture in which only the current data values are stored in the triple store . In order to perform analytics on how the data changes over time, the triple store must be periodically snapshotted into a data

warehouse based on a traditional relational database. Data warehouses are very expensive to implement and maintain using traditional development methods.

The traditional approach to data warehouse development requires the manual creation and maintenance of complex relational database schemas and so-called Extract-Transform-Load (ETL) processes. The creation of relational database schemas and ETLs requires developers with expert knowledge in order to achieve acceptable performance.

Existing higher-level approaches to data warehouse development are model-based; a data warehouse modeler creates a model, and then the data warehouse schema and ETL code are automatically generated . The disadvantage of this approach is that it does not allow the use of a pre-existing data warehouse schema. This approach does not support RDF data sources, and is therefore not applicable to warehousing Linked Data either in an existing data warehouse or in one that is generated by a tool .

A system or method is needed to facilitate a model-driven solution for warehousing Linked Data.

The novel contribution is a system for warehousing linked data based on high-level mapping specification. This system provides a low-cost, maintainable, and performant solution to the problem of warehousing Linked Data.

The core idea is to create a model-driven data-warehousing tool based on a high-level mapping between relational databases (RDB) and RDF. Given such a mapping, it is possible to wholly or partial generate a relational schema and generate static ETL code or to dynamically interpret the mapping at runtime to implement the ETL .

1


Page 02 of 5

This approach combines an existing database schema with newly generated portions of the schema based on the mapping . The ETL can be consistently optimized using best practices such as parallelization. The schema can be automatically evolved as the mapping changes. Other artifacts, s...