Al-Muhtaseb, Husni A. and Mahmoud, Sabri A. and Qahwaji, Rami S. (2008) Recognition of Off-line printed Arabic text Using Hidden Markov Models. SIGNAL PROCESSING, 88 (12). pp. 2902-2912.
This paper describes a technique for automatic recognition of off-line printed Arabic text using Hidden Markov Models. In this work different sizes of overlapping and non-overlapping hierarchical windows to generate 16 features from each vertical sliding strip are used. We experimented with all tested fonts (viz. Arial, Tahoma, Akhbar, Thuluth, Naskh, Simplified Arabic, Andalus, and Traditional Arabic). It was experimentally proven that different fonts have their highest recognition rates at different numbers of states (5 or 7) and codebook sizes (128 or 256). Arabic text is cursive, and each character may have up to 4 different shapes based in its location in a word. We decided to consider each shape as a different class hence resulting in a total of 126 classes. The achieved average recognition rates (using 126 classes and 16 features for each vertical strip of three pixels width) were between 98.08% for Thuluth and 99.89% for Arial. The main contributions of this work are the novel hierarchical sliding window technique, and using 16 features only for each sliding window. Each shape of the Arabic characters is considered as a separate class, bypassing the need for segmenting Arabic text, and is applicable to other languages.
|Divisions:||College Of Computer Sciences and Engineering > Information and Computer Science Dept|
|Creators:||Al-Muhtaseb, Husni A. and Mahmoud, Sabri A. and Qahwaji, Rami S.|
|Email:||firstname.lastname@example.org, email@example.com, R.S.R.Qahwaji@brad.ac.uk|
|Deposited By:||SABRI MAHMMOUD|
|Deposited On:||31 Aug 2008 11:01|
|Last Modified:||12 Apr 2011 13:16|
Repository Staff Only: item control page