Methods for Deep Learning Network Compression for Resource-constrained Devices
Publication Date: 2016-Jun-21
The IP.com Prior Art Database
Disclosed is a method to systematically reduce and compress a trained deep learning network in terms of its computation costs (floating-point operations per second) and/or the associated memory footprint, such that the reduced network achieves similar accuracy as the original deep learning network.
Page 01 of 4
Deep learning is an effective technique for many big data applications, such as object recognition and detection, speech recognition, text processing, etc. Typically, deep learning relies on a multilevel (i.e., deep) network with millions of parameters; hence, inferences on such a trained deep network requires millions to billions of floating-point operations per second (FLOPS).
Deploying a deep learning network on resource-constrained devices (e.g., mobile, drones, Internet of Things (IoT), etc.) is a big challenge due to limitations on computation resources, memory footprints for lightweight devices, and power capacity due to battery size. Yet, many of those applications require (nearly) real-time response (even at the expense of some reasonable loss of accuracy). A method is needed to solve the deep learning inference problem on those resource-constrained devices.
The novel solution is a method to systematically reduce and compress a trained deep learning network in terms of its computation costs (FLOPS) and/or the associated memory footprint, such that the reduced network achieves similar accuracy as the original deep learning network.
The method comprises a number of techniques that can be used in many different combinations to reduce the deep learning network to achieve different levels of compression rate and tradeoffs with inference accuracy. The method includes techniques to:
• Remove some internal nodes of the network based on its activation statistics • Remove some internal nodes of the network based on reconstructing information for those removed nodes from kept nodes • Reduce network connection edges based on reconstructing information for those removed edges from kept edges • Train a smaller network by using various inference information obtained from the given deep learning network
Given a trained deep learning network (denoted as A) with an associated weight parameter at each layer (Wi), training data and test data, find a smaller deep learning network (denoted as B) with a reasonable loss of test accuracy or a reduction ratio.
The method is comprised of the following steps:
1. Run inference on the network A using the training data of size N, and collect all the internal inference results of the network A on a layer-by-layer basi...