Browse Prior Art Database

Method of performing delay annotation with accelerated processing units

IP.com Disclosure Number: IPCOM000244601D
Publication Date: 2015-Dec-26

Publishing Venue

The IP.com Prior Art Database

Abstract

This disclosure is on method of performing delay annotation with accelerated processing units

This text was extracted from a Microsoft PowerPoint presentation.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 22% of the total text.

Slide 1 of 25

Method of performing delay annotation with accelerated processing units


Slide 2 of 25

Executive Summary

Routing delay annotation is exhaustive process for CPU that contribute to long compile time.

Multithreading technique applied to resolve every child nodes that branched from routing network has observable thread switching cost for large DIE.

Repackage data intensive but independent routing networks to suite for SIMD style execution for the delay annotation process in OpenCL kernel threads to maximize parallelism.

Provide significant delay annotation process time improvement. Based on simulation results, parallel process ran in GPU with OpenCL kernels outperformed CPU up to 1.82 times in large device.

2


Slide 3 of 25

Background

Routing delay annotation is one of the most frequent called SPICE-like simulation to determine propagation delay of the routing networks

  Intel Vtune Amplifier reported function call for delay annotation process was executed for about 99 million times.

Ever since architecture of new FPGA device family become more complex, number of routing networks has increased exponentially

  This will eventually induce heavy delay annotation workloads that lead to long compile time for new device family

Conventional software flow doesn’t suite to process delay annotation because:

  It is data intensive instead of compute-intensive process

  Huge number of delay nodes branched from routing tree, cause expensive threads switching overhead

Use of this heterogeneous system (CPU + GPU)

  To deal with both compute intensive and data intensive process could improve overall software performance due to its conditional suitability to resolve problems

3


Slide 4 of 25

Prior Art – Performing Delay Annotation with Multithreads

Patent:US8661385

DIE’s timing corner info are loaded, this is usually large

List of routing networks of DIE extracted from user design netlist

Routing nodes tree is constructed in root-child relationship

Use of Multithreading Technique to perform delay annotation for any child node of routing nodes tree

  Level of parallelism/concurrency depends on number of CPU cores

  Treads switching overhead become significant when workload is huge

   
     
         
   

All Sampling Paths

 
 
 
               
       
       
   
               
       
       
     
 
   
 
     
 
     
   

Process Delay Tree Annotation with Multithreading Technique

 

Note: For more details, please refer to backup slides in Slide:13

4


Slide 5 of 25

Invention of Accelerated Processing Technique

Overview

5


Slide 6 of 25

Invention of Accelerated Processing Technique … cont

Process Nodes illustrations

Proposed Implementation

 

Large Loop Iterations of whole DIE

t0

t0

t4

t5

t6

t7

     
 
 

Group of K size

routing delay networks

 

Parallel

   
 
                 
 

Multithreading

Existing Implementation

                           
       
   

Routing Nodes

     
 
       

1…K

1…K

1…K

NLSPC Circuit

t0

t1

t2

t3

Legend:

tx : CPU Time

K size : Workgroup size

GPU Threads

CPU Threads

6


Slide 7 of 25

Advantages

Provide significant compile time improvement

  Based on simulation results in Slide:24, parallel process ran in GPU with OpenCL kernels outperformed CPU up to 1.82...