ON OPTICAL CHARACTER RECOGNITION OF ARABIC TEXT. The 6th Saudi Engineering Conference, KFUPM, Dhahran, December 2002.



Although, optical character recognition has made tremendous achievements in the area of desktop publishing, yet a huge amount of work is required to be done. Unlike Roman like languages, there are various languages possessing a large number of fonts and/or having complicated shapes. Arabic language is one of those languages, which is somewhat complicated in its construction. Although a reasonable amount of work has been reported so far for Arabic language but still a good amount of work is needed to be developed. In addition, many other languages also need considerable attention for automatic generation in their recognition. Efficient, robust, and error free methodologies are required to develop systems for such languages so that the recent hardware technologies, to display and print, can be utilized. This work is devoted to one way of addressing the problem of recognition of the Arabic alphabet. We give a brief survey of the state of the art in Arabic Character Recognition and different methods and approaches to this problem. We show that recognition can be achieved by simple matching to prebuilt prototypes of all the Arabic Character set. This free segmentation approach proved to be efficient for the recognition of one font of the Arabic language. We deal with Arabic as a well-structured language and base our prototype description on a method called “Minimum Covering Run Expression”. We also show that our database of prototypes is easily extendable to allow for multifont recognition of Arabic as a basis for a full Arabic OCR system.

