An FPGA Architecture for Accelerating Convolutional Neural Network in Speech Recognition
Publication Date: 2016-Aug-11
The IP.com Prior Art Database
Convolutional neural network (CNN) has been proved to have excellent performance in computer vision, natural language processing, and also in speech recognition domain, while its computation complexity is very high, which usually needs some additional hardware, such as graphic processing unit (GPU) and field programmable gate array (FPGA), to accelerate to achieve real time recognition. GPU has been widely used to accelerate CNN, especially in the training stage because of its strong scientific computing capacity. However, its power consumption is relatively higher, up to several hundred watts each piece. FPGA has the advantage of the much low power consumption and relatively small size, which would be deployed in data center with high density, and is also suitable for embedded applications. This invention proposed a novel architecture in FPGA to accelerate the CNN network in speed recognition. It maps the CNN to a stream based structure, where frames of utterance are streamed into the CNN network one by one, while the result of the phoneme probability is streamed out one after one, which is very suitable for FPGA to process because of its pipeline feature. This structure could largely reduce the RAM requirements for FPGA, and can also support for any long utterance.
Page 01 of 4
An FPGA Architecture for Accelerating Convolutional Neural Network in Speech Recognition Details of network architecture
This figure shows the architecture of the accelerator in FPGA for convolutional layer and dense layer.
Page 02 of 4
In the convolutional layer, we use a shift buffer array to store the streamed in frames of utterance, the size of each element is based on the size of input feature (such as Mel-frequency cepstrum coefficient (MFCC)) of each frame, and the depth of this shift buffer array is according to the size of the convolutional kernel plus some buffer margin. Such kind of buffer could be implemented using FPGA internal block RAM. Since the scale of the convolutional part is relatively small
Page 03 of 4
compare to the whole network, and this weights of the convolution kernel could be pre-stored in the internal FPGA block random access memory (RAM). As to how may convolution kernels could be computed simultaneously is based on the total FPGA digital processing unit (DSP) resource and whole network pipeline delay balance.
For the dense layer (could be viewed as special convolutional layer), since the number of weights is extremely large, and it is impossible to store all the weight in the internal FPGA block RAM. We usually as an on-board double data rate synchronous dynamic random access memory (DDR SDRAM) to store all the weights, and only load very small part into internal FPGA block RAM for computing. The critical problem of such operation is the I/O bandwidth betw...