Optical Character Recognition (OCR) aims to recognize text in natural images. Inspired by a recently proposed model for general image classification, Recurrent Convolution Neural Network (RCNN), we propose a new architecture named Gated RCNN (GRCNN) for solving this problem. Its critical component, Gated Recurrent Convolution Layer (GRCL), is constructed by adding a gate to the Recurrent Convolution Layer (RCL), the critical component of RCNN. The gate controls the context modulation in RCL and balances the feed-forward information and the recurrent information. In addition, an efficient Bidirectional Long Short-Term Memory (BLSTM) is built for sequence modeling. The GRCNN is combined with BLSTM to recognize text in natural images. The entire GRCNN-BLSTM model can be trained end-to-end. Experiments show that the proposed model outperforms existing methods on several benchmark datasets including the IIIT-5K, Street View Text (SVT) and ICDAR.
Jianfeng Wang (Beijing University of Posts and Telecommunications)
Xiaolin Hu (Tsinghua University)
More from the Same Authors
2016 Poster: LightRNN: Memory and Computation-Efficient Recurrent Neural Networks »
Xiang Li · Tao Qin · Jian Yang · Xiaolin Hu · Tie-Yan Liu
2015 Poster: Convolutional Neural Networks with Intra-Layer Recurrent Connections for Scene Labeling »
Ming Liang · Xiaolin Hu · Bo Zhang