Browse Prior Art Database

Method for updating the correct data-source in a federated system using a classifier.

IP.com Disclosure Number: IPCOM000129854D
Original Publication Date: 2005-Oct-07
Included in the Prior Art Database: 2005-Oct-07
Document File: 3 page(s) / 72K

Publishing Venue

IBM

Abstract

Method for updating the correct data-source in a federated system using a classifier.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Method for updating the correct data -source in a federated system using a classifier .

Information integration using federation enables a number of heterogeneous data sources to be connected and queried as if a single data source. In products such as WebSphere Information Integrator*, this federation is performed using wrappers that perform the translation necessary to enable the various data sources to be queried using a single SQL statement.

    Where the query is from a single federated table or is a join across federated data sources it is possible to analyse the query to determine where each component of the join has been derived. Provided no irreversible transformation is applied when creating the view (e.g. a column function is applied) and provided the relational integrity of the remote data source allows the operation, it is possible to insert data into the view and using mechanisms such as 'instead of' triggers update the remote tables with the relevant information.

    In many cases however the federated view is selected on the basis of a UNION or UNION ALL operation. In such cases it may not be possible to insert a row into the federated view since there is no immediate way of knowing to which of the federated tables the data should be written.

    In some situations it may be possible to identify a specific value of a field within the view that can be used to uniquely identify the original source. In such situations a rule based mechanism can be used to update the federated source. However, in the general this will not be the case.

    A solution to this problem is to construct a classifier from the data in the view. The classifier determines the characteristics of the data from each data source that are unique to that source and where such unique characteristics are found, new data entered into the view can be passed through the classifier to identify the source and this can then be updated.

    In situations where the confidence in assigning a record to a federated data source is below a user defined threshold value the user can be informed.

The solution comprises two main steps:

STEP 1 Build the Classifier

This step is illustrated in Figure 1.

    When a view is created using a union or union all operation from a number of federated data sources, the view is automat...