Browse Prior Art Database

Method for tracking association of tasks and its memory references in order to improve locality of reference in a NUMA system

IP.com Disclosure Number: IPCOM000249125D
Publication Date: 2017-Feb-08
Document File: 4 page(s) / 45K

Publishing Venue

The IP.com Prior Art Database

Abstract

Hypervisors and operating systems have a need to schedule tasks and their associated memory pages as close as possible within a Symmetric Multi Processing (SMP) system topology. When tasks and their allocated physical memory reside within the same processor socket or Non-uniform Memory Access (NUMA) node, they benefit from lower memory access latencies and also help overall system performance by conserving SMP interconnect bus bandwidth.

Various memory allocation and migration algorithms are deployed to continuously optimise memory page placement within the system. These algorithms take input typically from memory access statistics available from various units in the hardware and further from actual page fault statistics that can accurately associate task and their memory pages.

In this article we propose an enhancement to microprocessor hardware where accurate association of tasks and their memory reference can be estimated for the purpose of memory placement optimizations without injecting page faults. Further the proposed technique could be customized to work for tasks, containers, and virtual-machines.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 34% of the total text.

1

Method for tracking association of tasks and its memory references in order to improve locality of reference in a NUMA system

A typical hypervisor software while scheduling virtual machines or an operating system while scheduling tasks will try to keep the tasks and its associated memory pages as close as possible within the SMP system topology. For example, tasks and its memory located within the same NUMA node or same processor socket will benefit from local chip/node memory reference that is optimal for the task and also improves system level throughput by conserving inter-chip SMP bandwidth. Scheduling tasks and its memory allocation within a processor socket or node is achieved by memory allocation policies and task scheduling policies in a modern operating systems or hypervisor.

However, over the runtime of a task or virtual machine, the memory allocations and cpu resource availability may not be optimal due to various other tasks and virtual machines that could come in and modify or fragment the allocations. Modern operating systems like Linux have new software algorithms to track affinity between tasks and their memory called autonuma balancing and tries to get the system back to optimal state during runtime. However, discovering the associativity between tasks and its memory is achieved by injecting page faults and recording the source of the page fault. This has excessive overhead in discovering the associativity between the task and its memory allocations. In this article we propose a hardware enhancements that would significantly reduce this overhead.

The proposed hardware enablement would help software algorithms like Linux autonuma to discover associativity between tasks and its memory with very minimal overhead so that it is possible to identify task to memory affinity with hardware assist and then migrate tasks closer to its memory.

We propose that the microprocessor's Memory Management Unit (MMU) would be enhanced to track which "cpu" (including thread/core/socket/node) and an additional context information like Virtual Machine (VM) or Logical Partition (LPAR) ID or container ID be tracked and made available to operating system or hypervisor for further scheduling and resource allocation optimizations. On a modern enterprise class microprocessor, this information could be a processor identification register that identifies cpu chip/core/threads along with a software loadable Special Purpose Register (SPR) register that contains the context ID. For example we could use LPAR ID, and Process ID (PID) register as context information.

Operating system or hypervisor would use this information as input for task scheduling and memory migration policies such that tasks and its memory are moved close to each other within the same socket/chip or node.

Further we also propose a statistical reference estimation method where the hardware can provide only relevant source information over

2

a defined sampling period such that the hardware...