Method and system for distributed reconciliations using data partitioning
Publication Date: 2012-Jul-05
The IP.com Prior Art Database
Identity Management (IdM) solutions provide a mechanism, called reconciliation, which is a procedure to synchronise user identities from target systems with the IdM. A known problem with existing reconciliation mechanisms is that they do not scale well as the number of identities increases. This article describes a solution that partitions the user identity data and incorporates parallelism to improve the performance and scalability of the reconciliation mechanism.
Page 01 of 7
Method and system for distributed reconciliations using data partitioning Introduction
IdM solutions usually provide a mechanism to retrieve user identities from target systems and compare them to their last recorded state. This process is referred to as reconciliation. Target systems often contain large numbers of identities, hence reconciliations usually involve reading and processing large data sets. Reconciliation is commonly affected by the following problems:
take a long time to complete and cannot be run frequently;
adversely impacts the performance of the IdM system as well as its target systems; and
does not scale well as the number of users increases.
These problems are exacerbated by the fact that reconciliation does not typically make use of the redundancy in the system architecture. In typical IdM scenarios, the agents or adapters communicate only with a single target system during the reconciliation or data synchronisation process, although the target system is quite often deployed over multiple servers for load balancing. The choice of communicating with only a single server tends to simplify the problem since it is not trivial to ensure that the load is evenly distributed across multiple servers. As a result, it is likely that other redundant servers in the system architecture will be left idle. In addition to making use of redundant servers, an approach that accurately spreads the reconciliation load across those servers is required. An approach that spreads reconciliation load based on user IDs is be to reconcile IDs A - M from one server and N - Z from another server. However, this approach does not evenly distribute the processing load because it is possible that there is a significantly larger population of users whose user ID begins with A - M, or vice-versa.
In order to fully utilise the available resources in the IT system, the IdM system requires an intelligent and practical way of partitioning the data for reconciliation.
The solution aims to improve the overall performance of reconciliation by:
partitioning the set of user identities into "nearly" equal portions; and
performing parallel reconciliations using replicated target resources.
The solution makes use of multiple replicated target resources so that the reconciliation throughput can scale with an increasing number of users by using additional replicas of the target resource in the system architecture. Instead of requiring a system administrator to manually configure the data partitions, the solution presents a method to dynamically determine each data partition. The solution enables partitions to be adapted to changes in the user population as
well as the system architecture without the need for human intervention.
Page 02 of 7
The following diagram illustrates the key components of the solution and the sequence of activities performed by these components.
Step 1: Create groups to categorise users