Browse Prior Art Database

Mapping Attributes and their Values from Across Data Sources Disclosure Number: IPCOM000235057D
Publication Date: 2014-Feb-26
Document File: 3 page(s) / 49K

Publishing Venue

The Prior Art Database


Data sources from different domains or different geographies often use different terminology to refer to the same attribute. To be able to reason across such varied data sources, the differences in terminology and semantics needs to be reconciled, so that reasoning may be done at a level that encompasses such varied data sources. Consider the example of a subscriber database from two mobile service providers servicing the same area; one may use "sex" as one of the customer attributes whereas the other may use "gender" to refer to the same. Similarly, one may contain education from a domain of values such as {"Bachelors", "Masters", ...} whereas another may use {"First Degree", "Advanced Degree",..} etc. In this article, we discuss techniques to solve two issues towards enabling integrating data from such varied data sources; (a) Attribute Mapping: This relates to the problem of identifying attribute mappings; in our example, attribute mapping techniques should be able to uncover the mapping between "Gender" and "Sex", (b) Value Mapping: Once any attribute mapping is performed, we would then need to map the values of one attribute to that of the mapped attribute. In our example, "Masters" may be best mapped to "Advanced Degree". We propose techniques that do not require any lexical match between attributes and/or values to be able to identify the mappings; in particular, we propose the usage of distributions and value correlations to identify mappings between attributes and values. Previous techniques either require approximate lexical matches between candidate mappings, and/or assume that there exists a foreign key between the data sources to be joined on; our methods are a step towards enabling mappings even when such assumptions do not hold.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 01 of 3

Mapping Attributes and their Values from Across Data Sources

Consider two data sources such as the following:

There is evidently very minimal chance of being able to reason across these two data sources without the availability of mappings since there are no lexical matches between the attributes or their values, and there is no foreign key on which a join may be performed. We propose a mapping discovery engine that can take such varied data sources, and uncover mappings between attributes and values. An example operation of our technology would be as follows:


Page 02 of 3

As indicated above, our technology is expected to be able to uncover attribute mappings (e.g., State and Province, Education and Qualification) as well as value mappings (e.g., TX and Texas, AZ and Arizona). The availability of such mappings enables joining of these two data sources so that any subsequent reasoning may be done on a larger pool of data that encompasses entities from across such varied data sources.

Attribute Mappings

One simple method of discovering attribute mappings between data sources is as follows. Let DS1 be a data source with attributes {A1, A2, ...} whereas DS2 has attributes {B1, B2, ...}.

- For each attribute Ai in DS1

- Let the number of values of Ai be n

- Let the distribution of values in Ai be Dist(Ai)

- For every attribute Bj in DS2 which has not already been mapped to an attribute in DS1 - Let m be the number of values in Bj

- Let Dist(Bj) be the distribution of values in Bj

- Score(Ai,Bj) = a (1/|(m-n)|) + (1-a) DistSim(Dist(Ai), Dist(Bj))

- Associate Ai with the attribute of DS2, B, such that Score(Ai,B) is maximized.

- Output the mappings so far obtained.

Once such attribute mappings have been arrived at, the values...