Browse Prior Art Database

Cascade deep learning model training for visual analytics Disclosure Number: IPCOM000245969D
Publication Date: 2016-Apr-21
Document File: 3 page(s) / 110K

Publishing Venue

The Prior Art Database


In many applications, it is needed to train a multi-class model for visual analytics, e.g. image classification, object detection. However, the number of available samples for each class is not evenly distributed, often it is a long-tail distribution and makes the learned model struggle at those classes with few samples. More importantly, different categories of images may be very close to each other by appearance, which calls for more fine-grained model to handle this challenge. This disclosure proposes an approach to decouple the multi-class deep learning task into subgroups, which is aimed to better capture the fine-grained features for similar classes. The main advantage is improving the model accuracy especially for those categories with similar appearances. Specifically, a cascade tree structure is designed of which a classifier is associated with each of its node. Accordingly, a cascade training procedure is designed which incrementally retrain the son models based on its parent models. This further improves the training efficiency.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 3

Cascade deep learning model training for visual analytics

Step 1: cascade decision to subsets by similarity: Clustering images into different subgroups in a hierarchical manner. Some available hierarchical clustering methods such as: 1) Visual similarity: graph degree linkage: agglomerative clustering on a directed graph; 2) Textual similarity: image with text description, meta-information. Then define the pairwise visual similarity between two categories by the general-purpose feature representation. See the following chart for illustration.

Step 2: inherit parent model knowledge by fine-tune. Train a deep neural network by the standard one-shot multi-class model. For each branch, copy this model as initial parameters, and fine-tune using the images with class labels in that subgroup, its advantages are 1) Efficiency for the involved sample size: only a subset of samples in that branch are used in fine-tuning not all classes of samples; 2) Convergence speed: fine-tune based on the initialized model is more efficient in that it converges more faster. The benefits are knowledge in


Page 02 of 3

the parent node can be inherited and reused in its offspring branch by fine-tune the offspring model based on the parent model parameters as initialization.

Offline cascade generation and branch model training: 1) Automatically build the cascade by hierarchical clustering based on visual appearance, text description, rules, knowledge base etc.; 2) Incrementally train the branch scoring model, by inheriting the parent model as the initial parameters for next la...