Surety is performing system maintenance this weekend. Electronic date stamps on new Prior Art Database disclosures may be delayed.
Browse Prior Art Database

Method of performing delay annotation with accelerated processing units

IP.com Disclosure Number: IPCOM000244601D
Publication Date: 2015-Dec-26
Document File: 25 page(s) / 2M

Publishing Venue

The IP.com Prior Art Database


This disclosure is on method of performing delay annotation with accelerated processing units

This text was extracted from a Microsoft PowerPoint presentation.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 22% of the total text.

Slide 1 of 25

Method of performing delay annotation with accelerated processing units

Slide 2 of 25

Executive Summary

Routing delay annotation is exhaustive process for CPU that contribute to long compile time.

Multithreading technique applied to resolve every child nodes that branched from routing network has observable thread switching cost for large DIE.

Repackage data intensive but independent routing networks to suite for SIMD style execution for the delay annotation process in OpenCL kernel threads to maximize parallelism.

Provide significant delay annotation process time improvement. Based on simulation results, parallel process ran in GPU with OpenCL kernels outperformed CPU up to 1.82 times in large device.


Slide 3 of 25


Routing delay annotation is one of the most frequent called SPICE-like simulation to determine propagation delay of the routing networks

  Intel Vtune Amplifier reported function call for delay annotation process was executed for about 99 million times.

Ever since architecture of new FPGA device family become more complex, number of routing networks has increased exponentially

  This will eventually induce heavy delay annotation workloads that lead to long compile time for new device family

Conventional software flow doesn’t suite to process delay annotation because:

  It is data intensive instead of compute-intensive process

  Huge number of delay nodes branched from routing tree, cause expensive threads switching overhead

Use of this heterogeneous system (CPU + GPU)

  To deal with both compute intensive and data intensive process could improve overall software performance due to its conditional suitability to resolve problems


Slide 4 of 25

Prior Art – Performing Delay Annotation with Multithreads


DIE’s timing corner info are loaded, this is usually large

List of routing networks of DIE extracted from user design netlist

Routing nodes tree is constructed in root-child relationship

Use of Multithreading Technique to perform delay annotation for any child node of routing nodes tree

  Level of parallelism/concurrency depends on number of CPU cores

  Treads switching overhead become significant when workload is huge


All Sampling Paths


Process Delay Tree Annotation with Multithreading Technique


Note: For more details, please refer to backup slides in Slide:13


Slide 5 of 25

Invention of Accelerated Processing Technique



Slide 6 of 25

Invention of Accelerated Processing Technique … cont

Process Nodes illustrations

Proposed Implementation


Large Loop Iterations of whole DIE








Group of K size

routing delay networks





Existing Implementation


Routing Nodes





NLSPC Circuit






tx : CPU Time

K size : Workgroup size

GPU Threads

CPU Threads


Slide 7 of 25


Provide significant compile time improvement

  Based on simulation results in Slide:24, parallel process ran in GPU with OpenCL kernels outperformed CPU up to 1.82...