Browse Prior Art Database

An efficient system for data type transformations based on graph algorithms

IP.com Disclosure Number: IPCOM000131208D
Original Publication Date: 2005-Nov-09
Included in the Prior Art Database: 2005-Nov-09
Document File: 2 page(s) / 25K

Publishing Venue

IBM

Abstract

Application of graph algorithms to efficiently realizing data type transformations

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 2

An efficient system for data type transformations based on graph algorithms

Our invention targets the problem of data type transformations in computer systems. Data transformations are typically required when a data producing entity and a data consuming entity use different formats for essentially describing similar data. For example, entity A may describe a Name using a structure with two string fields holding the first and last names whereas entity B may describe a Name using a single string with a comma delimiting the last name from the first name.

We model the type transformation problem as a weighted directed graph with nodes representing each distinct type and directed edges representing the existence of transformations from their source nodes to their target nodes. We add weights to the edges to represent the costs of the corresponding transformations, if such costs can be estimated. If costs of individual direct transforms cannot be estimated a-priori, the edges can remain unweighted - in this case, the algorithms discussed below will minimize the number of direct transforms necessary to achieve an end-to-end transform.

We use an algorithm for determining strongly connected components to identify equivalence classes. Within a strongly connected component of the graph, we are guaranteed the existence of at least one path between any pair of nodes - this path represents a sequence of transformations required to get from the source data type to the target data type. When there are multiple paths, we select the least expensive path among them to minimize the cost of transformation. There are well-known algorithms in graph theory for computing strongly connected components (SCC) as well as shortest (least expensive) paths between all pairs of nodes (APSP). For example, SCC can be computed in O(V+E) steps using two depth-first searches (Kosaraju's algorithm), while APSP can be computed in O(V^3) steps by simple deterministic algorithms (e.g Dijkstra's algorithm) and using asymptotically fewer steps (expected value) using randomized algorithms (Here V is the number of nodes and E the number of edges in the graph (Note, for the randomized version with improved running time, the edge weights, corresponding to the transformation costs, must be uniform, which limits its application in this case to situations where the transform cost...