Recognition of Off-line printed Arabic text Using Hidden Markov Models

(2008) Recognition of Off-line printed Arabic text Using Hidden Markov Models. SIGNAL PROCESSING, 88 (12). pp. 2902-2912.


Download (99kB) | Preview


This paper describes a technique for automatic recognition of off-line printed Arabic text using Hidden Markov Models. In this work different sizes of overlapping and non-overlapping hierarchical windows to generate 16 features from each vertical sliding strip are used. We experimented with all tested fonts (viz. Arial, Tahoma, Akhbar, Thuluth, Naskh, Simplified Arabic, Andalus, and Traditional Arabic). It was experimentally proven that different fonts have their highest recognition rates at different numbers of states (5 or 7) and codebook sizes (128 or 256). Arabic text is cursive, and each character may have up to 4 different shapes based in its location in a word. We decided to consider each shape as a different class hence resulting in a total of 126 classes. The achieved average recognition rates (using 126 classes and 16 features for each vertical strip of three pixels width) were between 98.08% for Thuluth and 99.89% for Arial. The main contributions of this work are the novel hierarchical sliding window technique, and using 16 features only for each sliding window. Each shape of the Arabic characters is considered as a separate class, bypassing the need for segmenting Arabic text, and is applicable to other languages.

Item Type: Article
Subjects: Computer
Department: College of Computing and Mathematics > Information and Computer Science
Depositing User: SABRI MAHMMOUD
Date Deposited: 31 Aug 2008 08:01
Last Modified: 01 Nov 2019 14:09