Browse Prior Art Database

Instruction-Set Extendible GPU

IP.com Disclosure Number: IPCOM000248804D
Publication Date: 2017-Jan-12
Document File: 4 page(s) / 96K

Publishing Venue

The IP.com Prior Art Database

Abstract

To speedup kernel execution in a GPU this disclosure proposes to extend the GPU architecture such to include reconfigurable computing units where to map custom instructions.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 31% of the total text.

Instruction-Set Extendible GPU

To speedup kernel execution in a GPU this disclosure proposes to extend the GPU architecture such to include reconfigurable computing units where to map custom instructions.

GPUs are Single Instruction Multiple Data (SIMD) architectures where a single control unit manages a large set of functional units by executing the same instruction on in parallel over different data.

These architectures are used to speedup computational intensive kernels by intensively parallelizing loop iterations. The execution time of a parallel loop on a GPU depends mainly on:

1) How many iterations can be run in parallel (accounting for HW constraints)

2) How many iterations have to be executed 3) The average execution time of a loop iteration The execution length of a loop iteration can be reduced by including in the

GPU instructions customized for the specific kernel. The idea is to cluster together a set of instructions that are often repeated by the kernel and agglomerate this set of operations into a single custom instruction to be implemented in customized computing units (CCUs). The custom instruction is specialized for a given kernel, thus it has to be configured as the kernel is loaded onto the GPU. Thus reconfigurable HW is particularly suited for CCU implementation.

The idea of a customized instruction-set extension, is not new (even considering reprogrammable CCUs). Application-specific instruction-set processor (ASIP) is the common term to refer to CPU-based architecture having application-specific instruction set extension features [1, 2]. Nonetheless, the idea of customizable instruction-set extensions have never been applied before to accelerators such as GPUs. There are many synergies that let GPUs be very suitable for reconfigurable instruction set extensions:

- GPUs execute an amount of code that is significantly smaller of that executed by general purpose processors. GPUs execute kernels while general purpose CPUs execute whole applications and, in many cases, the operating system too. Finding the best set of instructions to be grouped together for implementing a customized operation is a problem easier to solve for a small kernel than for a large code base.

- When multitasking is supported on a general-purpose computing system, there is need for context switches that limits the ability of efficiently exploit application-specific instructions. On GPUs instead, a kernel per time is loaded on the hardware and this is executed until its termination. Only when the kernel exits a new kernel can be loaded. Kernel-specific custom operations can be programmed onto the GPU when the kernel is loaded and can be used without interruptions all along its execution.

- Reconfiguration overhead can be hidden on GPUs by programming the custom operations when the kernel is loaded. Kernel execution is very predictable and the kernels are often loaded before their execution is actually required.

2

- Following the SIMD paradigm, many ins...