Browse Prior Art Database

DEEP LERANING BASED OPTICAL CHARACTER RECOGNITION SYSTEM

IP.com Disclosure Number: IPCOM000249656D
Publication Date: 2017-Mar-15
Document File: 5 page(s) / 194K

Publishing Venue

The IP.com Prior Art Database

Abstract

A deep learning based text detection and recognition technique is disclosed. The technique comprises three steps to realize end-to-end text extraction from an image. At step one, a fully convolutional network (FCN) designed as a character detector identifies precise location of each character in the image. At step two, a classic convolutional neural network (CNN) is used to recognize each character and at step three a database constrained corrector is designed to retrieve the right text for recognition with few false predictions of the characters.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 36% of the total text.

DEEP LERANING BASED OPTICAL CHARACTER RECOGNITION SYSTEM

BACKGROUND

 

The present disclosure relates generally to optical character recognition system and more particularly to a deep learning based text detection and recognition system.

Text recognition system is an important advancement towards information retrieval and autonomous system. Nowadays, numerous optical character recognition (OCR) tools are available for reading text. However, several text recognition systems have also developed in the recent years. One conventional technique uses sliding window with random ferns for character detection and pictorial structure for word detection. Another conventional technique uses one-dimension (1D) over segmentation to identify the candidate character regions and then searches through the spaces of segmentation to maximize the score that combines the character classifier and grammar model. Another technique developed by European conference on computer vision 2014 (ECCV) on text recognition uses convolutional neural network (CNN) to train character classifier. Sliding window is also used to assign a confidence value of each category for each sliding point. Further, other geometrical and bigram constraints are combined using dynamical programming to find the optimal prediction of the text string.

A major shortcoming of ECCV-2014 is the use of sliding window. When the window slides at a location in between the characters, a false alarm may be given as belonging to a character category rather than the background. Further conventional techniques depend heavily on heuristic clues and a significant number of parameters required to be adjusted for good performance. Therefore, reading text in unconstrained images remains a challenge.

It would be desirable to have an improved technique for text detection and recognition from images in uncontrolled conditions.

BRIEF DESCRIPTION OF DRAWINGS

Figure 1 depicts an input image consisting of a string of characters output generated by a deep learning based technique for text detection and recognition described herein.

Figure 2 depicts that the input image is of a part of equipment with part number or serial number engraved on the part.

Figure 3 depicts the architectural network with character detector and classifier sharing first three convolutional layers.

DETAILED DESCRIPTION

A deep learning based text detection and recognition technique is disclosed. As depicted in Figure 1, the technique comprises three steps to realize end-to-end text extraction from an image. At step one, a fully convolutional network (FCN) designed as a character detector identifies precise location of each character in the image. At step two, a classic convolutional neural network (CNN) is used to recognize each character and at step three a database constrained corrector is designed to retrieve the right text for recognition with few false predictions of the characters.

Figure 1

Figure 2 depicts that the input image is of a part of equi...