Extract text from a Cyprus Personal ID using OCR

This project was initiated to explore and investigate the existing Cyprus ID format, including the structure and the various fields it contains, with a deep dive into the crafting of the distinctive features. The objective is to gain a thorough understanding of how these fields are structured and to then create an advanced, accurate, and fast OCR tool using Python. OCR technology allows for the digitization, reading, and interpretation of characters from physical documents. Successfully extracting and processing information from Cyprus IDs could significantly enhance the quality of services in various sectors, such as airports, police, and other industries. The development of an image-based tool requires an algorithm that can handle less-than-perfect character shapes and forms, necessitating the use of approximate matches—similar to an autocorrect function. Fields such as names and surnames need to align with a specific percentage of accuracy to meet the unit testing standards. This is achievable with confidence level scores that gauge the algorithm's accuracy in reflecting reality. Throughout the project, the challenges, issues, and potential areas for future enhancements are identified to better understand the present state of Cyprus IDs and to expand their current applications and use cases.

URI

http://gnosis.library.ucy.ac.cy/handle/7/66270

Collections

Τμήμα Πληροφορικής / Department of Computer Science [117]

Cite as

The following license files are associated with this item:

Creative Commons

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess