Browse Prior Art Database

Discovering partitioning schemes in distributed hierarchical data Disclosure Number: IPCOM000174596D
Original Publication Date: 2008-Sep-16
Included in the Prior Art Database: 2008-Sep-16
Document File: 3 page(s) / 29K

Publishing Venue



How to detect the partitioning scheme of distributed hierarchical data given only the back-end servers that are involved.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

Discovering partitioning schemes in distributed hierarchical data

In a distributed directory environment, there is a proxy server and back-end directory servers that contain partitioned data. Data is partitioned using a hashing algorithm. The proxy server stores the configuration to know which servers hold which partitions of data. Within the back-end servers there is no indication that the data is partitioned. If for some reason the proxy server's configuration is lost and there is no backup of that configuration, the distributed environment is useless. As of today there is no easy way to recover.

So the problem is -

How can the partitioning scheme of distributed hierarchical data be detected given only the back-end servers that are involved. This solution assumes that the administrator is able to provide the LDAP (Lightweight Directory Access Protocol) URLs for the back-end servers involved.

A recovery process is configured with the LDAP URLs and bind credentials for the known back-end servers in the distributed directory. The process will bind to the back-end servers and perform a series of LDAP queries to determine the partitioning scheme.

The queries first detect the split point, then validate how many partitions and which back-end servers contain which data.

Advantages - currently no known solutions to this problem - other than to combine and redistribute the data, or manually attempt to determine the partitioning scheme and reconfigure the proxy server with the data.

In a disaster recovery environment a quick, automated solution to restore the environment is needed.

Step 1) Determine replication topology to remove duplicate servers.

    Perform a search to locate all replication agreements on all servers. Align servers into replication relationships. Reduce the server list to only contain one server from each unique replicated context. This way there is only one server per partition.

Step 2) Determine all the split points and which servers are involved in each split point.

2a) Pick every unique suffix from all the servers and add it to a list of possible split points.

2b) For every possible split point, do a one level search asking only for the DN of the entries in all the servers. If there are X servers, there will be X lists of DNs (Distinguished Names) which are one level below the suffix. A sorting of the X lists will also be done to ease comparison. Then the DNs will be compared with those in the X lists.

2c) If none of the DNs from different lists match, it can be concluded that the DN considered as a possible split point is actually a split point. Based on the servers which has this DN, the servers which are involved for this split point can be identified. This tree does not need to be traversed anymore and step 2b should be done for the next DN in the list of possible split points.

2d) If one or more DNs from the lists match, but some do not, it can be concluded that the DN consi...