Browse Prior Art Database

Management of Distributed Computing Clusters

IP.com Disclosure Number: IPCOM000240446D
Publication Date: 2015-Jan-30

Publishing Venue

The IP.com Prior Art Database

Abstract

The disclosed idea is regarding the cost-benefit assessment of function-shipping vs. data-shipping in globally distributed systems and the use thereof to optimize system performance and to minimize costs.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 13% of the total text.

Page 01 of 10

Management of Distributed Computing Clusters

Introduction

    Consider a system consisting of n globally distributed clusters of data sources or data collection points, Xi , i = 1, …, n . Each cluster, Xi , consists of nidata collection points, xi,j , j = 1, …, ni . These data sources or data collection points have enough storage resources to store
all the data that are collected near them and, in addition, may also have some computing resources. Typically, in each cluster, one of the data collection points is also a datacenter, that
is, it has a much larger amount of computing resources compared to the other data collection points in that cluster. In such situations, let xi, 1 denote the datacenter of cluster Xi .

    We wish to execute a certain function, f , on the data stored in all of these data collection points. As an example, each of these data collection points could be a retail store belonging to a multinational retail company, where the data being collected is the sales of various items. The function f could then be the market segmentation of the customer base of this globally distributed retail company. Another example is where each of these data collection points is a local branch of a globally distributed financial firm, where the data being collected is the transactions that occur for accounts held at each branch. The function f in this context could be to identify anomalous sequences of transactions over a period of time across all the branches. Another possible example is where each of these data collection points is the location of a telescope and the function f is to identify galaxies of a certain type.

    The data on which we wish to perform the desired function is dispersed globally. We could, if we wish, move all the data to one location and execute the function on all the data at that location. However, this may involve transferring an extremely large amount of data over large distances, which in turn implies high costs for data transfer. Furthermore, in certain contexts, local regulations may not allow for transfer of certain data across borders. Therefore, we may wish to execute this function locally at each data collection point and only send the results. However, this may also not be feasible because of limited computing resources at these locations, limited availability of energy, and so on. Moreover, a thorough examination through subsequent refined functions may be needed to obtain the desired results. It may not be feasible to do these subsequent refined functions on all the data at all the locations. Therefore, it becomes essential to isolate the relevant data using a filtering function at the local sites. The data filtered through these filtering algorithms may be transferred to a central location where it may be processed further.

    The benefit of doing such a filtering and transferring of filtered data compared to transferring all the data to a central location depends on the cost of computing resources...