Poster
Muharaf: Manuscripts of Handwritten Arabic Dataset for Cursive Text Recognition
Mehreen Saeed · Adrian Chan · Anupam Mijar · joseph Moukarzel · Gerges Habchi · Carlos Younes · amin elias · Chau-Wai Wong · Akram Khater
West Ballroom A-D #5110
We present the Manuscripts of Handwritten Arabic (Muharaf) Dataset, which is a machine learning dataset of more than 1,600 historic handwritten page images punctiliously transcribed by experts in archival Arabic. Each document image is accompanied by spatial polygonal coordinates of its text lines as well as basic page elements. This dataset was compiled to advance the state of the art in handwritten text recognition (HTR) of not only Arabic manuscripts but also cursive text in general. The Muharaf Dataset consists of diverse handwriting styles and a wide range of document types including personal letters, diaries, notes, poems, church records, and legal correspondences. In this paper, we describe the data acquisition pipeline as well as the notable dataset features and statistics. We also provide a preliminary baseline result achieved by training convolutional neural networks using this data.
Live content is unavailable. Log in and register to view live content