Browse Prior Art Database

Graph Aware Caching Policy for Distributed Graph Stores

IP.com Disclosure Number: IPCOM000240727D
Publication Date: 2015-Feb-23
Document File: 5 page(s) / 137K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is method for caching graph nodes based on the associated physical location and neighborhood vicinity in order to minimize both network latency and disk access latency. This caching algorithm, the Clock Based Graph Aware (CBGA) algorithm, is suited for distributed graph platforms and aims to consider multiple factors while evicting items from the cache.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 24% of the total text.

Page 01 of 5

Graph Aware Caching Policy for Distributed Graph Stores

Due to the increasing size of graphs, distributed graph stores are increasingly popular. However, graph queries require jumping across the vertices, which entails back-and-forth message passing between distributed servers.

Cache Policies

The execution of graph algorithms typically requires passing the state information between arbitrary graph nodes, which can be located on different physical machines . Even if a bulk message passing protocol is executed between the machines as in a current system for large-scale graph processing, co-locating the neighboring graph nodes in the same machines can bring significant performance advantages . Hence, the available memory space in distributed servers can be exploited for caching purposes.

Existing distributed file system caches can be leveraged for reducing the cost of back-and-forth communication cost between physical servers. However, a graph aware caching policy can perform much better than traditional caching policies . The main reasoning behind this claim is that a graph aware cache knows the graph access patterns while executing the queries. For instance, accessing the neighbors of a node is a strong indication that the neighbors of neighbors of this particular node will also soon be accessed. Another problem with distributed caches is that the cache policy may not know whether the graph nodes and edges residing in the cache are retrieved from the local machine's disk or from a remote machine located in the cluster . Consideration of the latency of access to either of these locations can benefit the caching algorithm to reduce the latency of access while executing the graph queries .

Existing cache systems, in general, aim to predict future data access requests and accordingly optimize resources. To accomplish that goal, two data access patterns are considered. The first is spatial locality, which indicates that future data accesses will target spatially close data to current accesses. The second is temporal locality, which means that future data accesses will target the same data currently accessed . Spatial locality pattern is used to prefetch data into cache before it is accessed for the first time . Temporal locality pattern aims to keep already-accessed data in cache to meet possible future requests received under cache size limitation. These two techniques can be extended to improve performance of graph processing in distributed graph infrastructures.

Exploitation of Spatial Locality for Graph Processing

Generic cache algorithms assume that iteration over logical data order is correlated to physical data order in the lower layers in the cache hierarchy. For instance, consider
an iteration over the elements of an array a with an index variable i . While iterating over a, access to a[i] proceeds with an access to a[i+1] where a[i] and a[i+1] are physically co-located in the lower level storage medium. Thus, prefetching a[i+...