Custom Very Long Instruction Word Architecture with Multi-way Branch Support for Branch Intensive Applications
Publication Date: 2014-Feb-06
The IP.com Prior Art Database
This disclosure describes a customizable accelerator architecture for branch intensive applications based on a Very Long Instruction Word (VLIW) architecture with multi-way branch support. The instruction set of the VLIW processor and the number of branches and ALU instructions that can be processed per cycle can be customized for each application. IBM's Tree VLIW architecture can be used as an architectural template.
Page 01 of 4
Custom Very Long Instruction Word Architecture with Multi -way Branch Support for Branch Intensive Applications
The proposed architecture can be adapted, for instance, to act as a regular expression (regex) matching accelerator. A typical way of searching for a regex in an input stream is to apply the input stream to a deterministic finite state automaton (DFA) representation of the regex. Such a DFA can involve several state transitions stemming from each state although a general purpose processor is typically capable of evaluating only two branches (state transitions) per cycle and per state. To achieve higher processing rates than general purpose processors, a regex accelerator has to process multiple state transitions in parallel. In the below drawing, some of the state transitions of an example DFA are shown on the left. These state transitions (branches) can be modeled using "goto" statements in a C program and mapped to IBM's Tree VLIW instructions , as shown below. The Tree VLIW architecture evaluates all the branches encoded into its Tree VLIW instructions in parallel in a single clock cycle. If all the branches stemming from a single state cannot be allocated in a single Tree VLIW instruction, e.g., due to limitations on the number of branch slots supported per Tree VLIW instruction, evaluation of the branches is serialized into multiple Tree VLIW instructions, which can take multiple execution cycles to complete.
Experiments on a set of regexs from a network intrusion detection system (see the below plots) show that most of the time is spent on the default state (i.e., start state) of a DFA, and the start state typically has many more possible next states than the remaining states typically have. Therefore, we propose the use of a dedicated
Page 02 of 4
default rule table which encodes all possible next states for all possible inputs for the default state. The remaining states, on average, have fewer than four next states. Therefore, implementing support for three or four branch slots in a Tree VLIW architecture would be...