By Ashish Kasamaauthor-img
May 12, 2025|2 Minute read|
Play
/ / How to Extract Text from Images Using OCR in Python (With Tesseract & EasyOCR)

Looking for a way to turn photos or scanned documents into real, editable text? Welcome to the world of OCR — Optical Character Recognition.

What is OCR?

OCR helps you extract machine-readable text from images. From scanning invoices to digitizing receipts or reading license plates — it automates it all.


Best Python Libraries for OCR

  1. pytesseract – A Python wrapper for Google’s Tesseract engine.

  2. EasyOCR – Built for deep learning-based OCR with multilingual capabilities.


How to Set Up OCR in Python

Install dependencies:

pip install pytesseract easyocr pillow


Code Example – Extracting Text from an Invoice

Using Tesseract:

 

from PIL import Image import pytesseract text = pytesseract.image_to_string(Image.open('invoice.png')) print(text)


Using EasyOCR:

 

import easyocr reader = easyocr.Reader(['en']) results = reader.readtext('receipt.jpg') for result in results: print(result[1])


Detect Multilingual Text

Both tools support multiple languages.

pytesseract : Use lang='eng+hin'

 

easyocr : Use Reader(['en', 'hi', 'fr'])


Use Cases

  • Scan invoices and extract payment details

  • Parse printed receipts for inventory apps

  • Read license plates from traffic cams

  • Translate text from foreign signage


Final Thoughts

OCR is a powerful tool for digitizing real-world content. Whether you’re automating backend tasks or building AI-based systems, tools like Tesseract and EasyOCR make it simple.

Want to build your own document reader or smart scanner? Start with these libraries and add AI for context-aware enhancements.

Ashish Kasama

Co-founder & Your Technology Partner

One-stop solution for next-gen tech.