O. Redkin
Tuesday 10 April 2018 by lib_admin

References: 5th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM 2018, www.sgemvienna.org, SGEM2018 Vienna ART Conference Proceedings, ISBN 978-619-7408-32-4 / ISSN 2367-5659, 19 - 21 March, 2018, Vol. 5, Issue 3.1; 277-282 pp, DOI: 10.5593/sgemsocial2018H/31/S10.035


Each written text could be considered as a sequence of symbols put up in groups organized in a certain order and assembled in a linear sequence, which is may be arranged horizontally or vertically. The existing methods of optical character recognition (OCR) aim at linear and vertical segmentation of written texts basing on interword spaces and intervals between letters as the demarcation markers of lexical units and characters respectively. This strategy is effective for many languages meanwhile building robust OCR techniques for Arabic still remains extremely challenging task. The problem lies primarily in the very character of the Arabic script in which letters may vary depending on their position in words and have different lengths and heights not talking about its cursiveness. In this case, pure mathematical methods of OCR have limited efficiency. We suggest methodology which along with ’traditional’ attitudes takes into consideration such linguistic data as character entries frequency index, compatibility of characters within words, word frequency index for Arabic. This data are among the indicators that facilitate interpretation and identification of written text and should be used as the components to build a robust method of OCR for Arabic based scripts.

Keywords: language, Arabic, script, recognition, OCR.

Home | Contact | Site Map | | Site statistics | Visitors : 538 / 879438

Follow site activity en  Follow site activity LANGUAGE & LINGUISTICS  Follow site activity Papers SGEM2018   ?

Copyright 2014 SGEM International Multidisciplinary Scientific Conference on SOCIAL SCIENCES & ARTS. All Rights Reserved. 3.0.17 + AHUNTSIC

Creative Commons License