Learn how to easily extract text from any image using Python with OpenCV – with bonus code examples

Table of content

  1. Introduction
  2. Getting Started with OpenCV
  3. Understanding Image Processing Techniques
  4. Extracting Text from Images using Python
  5. Techniques to Improve Text Extraction Accuracy
  6. Bonus Code Examples for Text Extraction
  7. Conclusion


In this tutorial, you will learn how to use OpenCV, a popular Python library for computer vision, to easily extract text from any image. This process, known as optical character recognition (OCR), is essential for many applications such as document digitization, automated data extraction, and image-based search.

Using Python and OpenCV, you can use pre-trained machine learning models to detect and recognize characters in images. This means that you don't have to spend hours on manual data entry and can instead automate the process with just a few lines of code.

In this tutorial, you will learn the basics of how OCR works, how to set up OpenCV, how to load and preprocess images, and finally, how to extract text using machine learning models. Along the way, we'll also provide code examples to help you get started with your OCR project. Whether you're a beginner or an experienced Python programmer, this tutorial will give you the tools you need to start extracting text from images with ease.

Getting Started with OpenCV

If you're new to OpenCV, it's a powerful computer vision library that allows you to work with images and videos. It's written in C++, but it also has Python bindings, which makes it an easy and accessible tool for Python developers.

To start using OpenCV, you need to install it on your system. You can install it using pip, a Python package manager. Open up your terminal and run the following command:

pip install opencv-python

Once installed, you can import OpenCV in your Python code by adding the following line at the top of your script:

import cv2

With OpenCV, you can perform operations such as loading images, converting them to grayscale, and applying filters to them. The following code snippet shows you how to load an image and display it in a window:

import cv2

img = cv2.imread('image.jpg')
cv2.imshow('image', img)

In this code, we first use the imread() function to load the image ‘image.jpg’ and store it in a variable called img. We then use the imshow() function to display the image in a window with the title ‘image’. Finally, we use the waitKey() function to wait for a keyboard event.

In conclusion, OpenCV is a powerful library that holds valuable potential for image processing and computer vision algorithms. By importing it in Python, you can quickly start performing operations on images, such as loading them, resizing, applying filters, etc.

Understanding Image Processing Techniques

Image processing techniques are essential for extracting text from images in Python with OpenCV. These techniques involve a series of operations that help to enhance and manipulate digital images to extract the desired information accurately. Understanding grayscale conversion, thresholding, and contour detection are some of the essential aspects of image processing techniques used for text extraction.

Grayscale conversion is the process that converts an image from color to black and white or shades of gray. Python with OpenCV provides a built-in function, cvtColor(), that converts images to grayscale. Thresholding, on the other hand, is the process of assigning a pixel value either black or white based on a specific threshold value. It is useful in enhancing an image's contrast by separating the text pixels from the background. There are different types of thresholding techniques available, such as adaptive thresholding and Otsu's thresholding, which are commonly used in text extraction.

Contour detection is the process of identifying the boundaries of objects within an image. It is an essential image processing technique used in text extraction to detect the boundaries of individual characters or words. OpenCV provides a function called findContours() that identifies edges and their hierarchy in an image.

In summary, such as grayscale conversion, thresholding, and contour detection is crucial for successful text extraction from images in Python with OpenCV. By understanding these techniques, you can write better code to accurately identify and retrieve text from images.

Extracting Text from Images using Python

To extract text from images in Python, we can use the OpenCV library, which provides various algorithms for image processing and analysis. One of the most common methods for text extraction is to use an optical character recognition (OCR) algorithm, which recognizes characters in an image and converts them into editable text.

To use OCR in OpenCV, we need to first preprocess the image to enhance the text regions and remove any noise or other unwanted elements. This can be done using various techniques such as thresholding, erosion, dilation, and morphological operations.

Once the image is preprocessed, we can apply an OCR algorithm to the text regions to extract the text. There are many OCR algorithms available in Python, such as Tesseract, PyOCR, and OCRopus, which can be used depending on the specific requirements of the task.

To demonstrate how to extract text from an image using Python with OpenCV and Tesseract, we can write a simple code example that loads an image, preprocesses it, and then applies OCR to extract the text:

import cv2
import pytesseract

# Load the image
img = cv2.imread('image.jpg')

# Preprocess the image
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (5, 5), 0)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

# Apply OCR to extract the text
text = pytesseract.image_to_string(gray)

# Print the extracted text

In this code, we first load an image using the cv2.imread() function, which reads the image file into a NumPy array. We then preprocess the image by converting it to grayscale, applying a Gaussian blur filter to reduce noise, and thresholding the image to binarize it.

Finally, we use the pytesseract.image_to_string() function from the PyTesseract library to apply OCR to the preprocessed image and extract the text. The extracted text is then printed to the console using the print() function.

By using the above code and adjusting the preprocessing techniques and OCR algorithms to suit the specific requirements of the task, we can easily extract text from any image using Python with OpenCV.

Techniques to Improve Text Extraction Accuracy

One of the challenges when extracting text from images in Python with OpenCV is achieving high accuracy. Text extraction accuracy can be improved using various techniques that will help make image processing more robust and resistant to common errors.

One key technique is to preprocess the image before extracting text. This involves applying filters such as noise reduction, thresholding, contrast enhancement, and edge detection to improve the image's quality. Noise reduction filters remove any disturbances in the image that can cause the extracted text to be distorted or incomplete. Thresholding sets a threshold value, which helps to distinguish text from any other elements within the image, and contrast enhancement increases the difference between background and foreground. Edge detection technique on the other hand, detects changes in the intensity of the image, and can help isolate text regions within the image.

Another technique that can help enhance text extraction accuracy is to use language models. A language model can be trained to recognize the words or phrases that commonly appear in the images being processed. This can help improve the accuracy of the extracted text by reducing the number of false positives or negatives that the system generates.

Lastly, using deep learning and neural network-based approaches can immensely help in improving text extraction accuracy. For instance, training a deep learning model (such as a convolutional neural network) to identify characters or words can be an effective way to ensure accurate extraction of text from images.

Overall, by applying filters and preprocessing techniques, using language models and deep learning, and iterating on techniques until the desired accuracy level is achieved, text extraction accuracy can be greatly improved when working with images in Python using OpenCV.

Bonus Code Examples for Text Extraction


Now that you have a basic understanding of how to extract text from an image using Python with OpenCV, let's dive into some bonus code examples that will allow you to customize your text extraction even further.

Example 1: Extracting Text with a Specific Font

Often, you may only be interested in extracting text with a specific font. You can achieve this by training a machine learning model to recognize the font, and then invoking the same text extraction process as before:

import cv2
import pytesseract

# Read image and convert to grayscale
img = cv2.imread('image.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Train a machine learning model to recognize the font
font_classifier = cv2.ml.KNearest_create()
font_samples = cv2.imread('font_samples.png')
gray_samples = cv2.cvtColor(font_samples, cv2.COLOR_BGR2GRAY)
font_responses = np.arange(ord('A'), ord('Z')+1).reshape(-1, 1)
font_classifier.train(gray_samples, cv2.ml.ROW_SAMPLE, font_responses)

# Extract the text using the trained machine learning model.
text = pytesseract.image_to_string(gray, lang='eng', config='--psm 6')

if 'Arial' in text:
    print('Found text in Arial font!')

In this example, we first train a machine learning model to recognize the font we are interested in by using a K-Nearest Neighbors (KNN) classifier. We then invoke the same text extraction process as before, but this time we filter the resulting text for our target font by using the if statement with "Arial" as the condition.

Example 2: Extracting Text with a Specific Color

Similar to extracting text with a specific font, you may also want to extract text with a specific color. This can be done by converting the image to a specific color space, filtering for the desired color range, and then performing text extraction:

import cv2
import pytesseract

# Read image and convert to HSV color space
img = cv2.imread('image.png')
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

# Define the color range to filter for (in this example, we filter for blue)
lower_blue = np.array([110,50,50])
upper_blue = np.array([130,255,255])

# Filter for the blue color range
mask = cv2.inRange(hsv, lower_blue, upper_blue)
res = cv2.bitwise_and(img, img, mask=mask)

# Convert filtered image to grayscale
gray = cv2.cvtColor(res, cv2.COLOR_BGR2GRAY)

# Extract text
text = pytesseract.image_to_string(gray, lang='eng', config='--psm 6')

if 'blue' in text:
    print('Found text in blue color!')

In this example, we first convert the image to the HSV color space, which separates the color value from the brightness value, making it easier to filter for a specific color range. We then define our target color range (in this case, blue), filter the image to include only pixels in that range, and then convert the resulting image to grayscale. We can then invoke the same text extraction process as before and filter for text that includes our target color by using the if statement with "blue" as the condition.


In this tutorial, we have learned how to use OpenCV with Python to extract text from any image. We have gone through the necessary steps of loading the image, pre-processing it to make it easier to extract text, and finally using OCR to extract the text.

In addition, we have also learned how to use the if statement with "name" in Python to run specific code based on whether the script is being imported or run as the main program.

By combining these tools and techniques, we can easily extract text from images using Python, opening up new possibilities for automation and data extraction in our projects.

We hope this tutorial has been helpful and informative, and we encourage you to experiment with the code and try it out on your own images. With some practice and experimentation, you can become proficient in using OpenCV and Python to extract text from any image.

My passion for coding started with my very first program in Java. The feeling of manipulating code to produce a desired output ignited a deep love for using software to solve practical problems. For me, software engineering is like solving a puzzle, and I am fully engaged in the process. As a Senior Software Engineer at PayPal, I am dedicated to soaking up as much knowledge and experience as possible in order to perfect my craft. I am constantly seeking to improve my skills and to stay up-to-date with the latest trends and technologies in the field. I have experience working with a diverse range of programming languages, including Ruby on Rails, Java, Python, Spark, Scala, Javascript, and Typescript. Despite my broad experience, I know there is always more to learn, more problems to solve, and more to build. I am eagerly looking forward to the next challenge and am committed to using my skills to create impactful solutions.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top