Browse Prior Art Database

A Distributed Tail Framework Over RDMA

IP.com Disclosure Number: IPCOM000234666D
Publication Date: 2014-Jan-28
Document File: 6 page(s) / 130K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a massively parallel framework implementing one of the most important financial measures: tail, or nth element threshold selection. The framework utilizes the state-of-the-art usage of Remote Data Memory Access (RDMA) hardware in the cluster, and devises a fully asynchronous communication pattern to implement the tail measure.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 31% of the total text.

Page 01 of 6

A Distributed Tail Framework Over RDMA

The computation and memory bandwidth requirement for performing real time Credit Valuation Adjustment (CVA) and many other statistical measures have exploded with the dramatic increase of risk precision (i.e. up to 5,000 Monte-Carlo based scenarios and 250 time steps). The extreme high throughput and low latency demands have driven the technology solution into a distributed/cluster solution in which the compute and bandwidth of individual nodes in the cluster are aggregated to deliver the ultimate performance.

The novel contribution is a massively parallel framework (Figure 1) implementing one of the most important financial measures : tail, or nth element threshold selection. The framework utilizes the state-of-the-art usage of Remote Data Memory Access (RDMA) hardware in the cluster, and devises a fully asynchronous communication pattern to implement the tail measure . The result is a framework that minimizes one network communication requirement and simultaneously overlaps the remaining communication overhead with useful computation.

This is the first distributed tail framework over RDMA. A previous known solution focuses on a single compute node (i.e., not in a cluster) and when it reaches the hardware limit, a collapsed time or scenario dimension is used. Patent US2012/0209849AI

presents a distributed merge sort, which does not require the joining of the sorted results from individual compute nodes .

Figure 1 presents the overview of the massively parallel compute framework designed using an Infiniband network for the distributed tail measure.

Figure 1: Distributed Tail over Infiniband

1


Page 02 of 6

The compute nodes on the left are responsible for sorting the associated individual sheets . After sorting, the system stores the individual sheets in an RDMA-accessible (pinned) memory, organized as a circular buffer chain (labeled as RDMA Memory in Figure 4). A message is posted to the merge node specifying which buffer to use. The compute nodes carry on processing the next sheet in parallel.

The merge node computes tail by doing a partial merge. The merge node fetches the sorted data in blocks to amortize network latency. The merging continues until tail is complete, fetching more data from the compute nodes as required. All remote

accesses to the compute nodes are done asynchronously using the RDMA hardware, resulting in no interrupts.

The massively parallel framework is the first implementation of a distributed tail framework using RDMA over an InfiniBand * (IB) network. The data transferred in the distributed tail implementation is designed for minimizing network traffic . This is a performance oriented server architecture (i.e. servers being the compute nodes) based on usage of a memory efficient circular buffer and independent management of a circular buffer by a communication management thread . The computation thread is never disturbed and focuses solely on computation. The asynch...