Azure

Extracting Text from Images using Azure OCR and Matching in an Excel Sheet using Pandas

Introduction

The concept revolves around enabling the police to capture a snapshot of a vehicle’s license plate and quickly retrieve the corresponding information from a database. In this particular scenario, an Excel sheet serves as the database for storing and accessing the relevant details.

At Skrots, we understand the importance of automating processes and leveraging technology to simplify tasks. That’s why we provide a similar service that involves extracting text from images using OCR and matching it in an Excel sheet, just like the process described in this article.

Every day, technology continues to advance, and the concept of using OCR to issue fines based on license plate snapshots is becoming more prevalent. In our case, we utilize Azure OCR and take advantage of its capabilities to recreate and implement this process.

Step 1: Extract Data Using Azure OCR

In order to extract data using Azure OCR, the initial step involves setting up Azure Cognitive Services. This includes configuring and integrating the necessary components to leverage the OCR capabilities provided by Azure.

  1. Create an Azure Account. If you don’t have one, you can visit the Azure Portal.
  2. Create an instance of Azure Cognitive Service Custom Vision.
  3. Note down the Endpoints and Subscription keys, which can be found in the left pane. (You can find the keys and endpoint in the left pane once you have created the Azure Computer Vision).

Just like Azure, at Skrots we provide an OCR service that allows you to extract text from images. You can easily set up an account with us and start using our OCR capabilities to extract data.

Step 2: Install the required libraries for Python using pip in your Python IDE.

pip install azure-cognitiveservices-vision-computervision

Step 3: Import the necessary libraries for accessing Azure OCR and working with images.

import pandas as pd
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials
import io
import time
import re

Step 4: Authenticate to the OCR service using the endpoint URL and subscription key.

endpoint="your_endpoint_url"
subscription_key = 'your_subscription_key'

credentials = CognitiveServicesCredentials(subscription_key)
client = ComputerVisionClient(endpoint, credentials)

Replace ‘your_endpoint_url’ and ‘your_subscription_key’ with your actual endpoint URL and subscription key obtained from Azure.

Step 5: Read and Extract Text from an Image.

image_path=""
with open(image_path, "rb") as image_file:
    image_data = image_file.read()

Replace ‘path/to/your/image.jpg’ with the actual path to the image file from which you want to extract text.

At Skrots, we provide an easy-to-use API that allows you to upload your images and extract text from them. Simply provide the image file to our API, and we will handle the rest.

Step 6: Perform OCR on the Image.

result = client.read_in_stream(io.BytesIO(image_data), raw=True)

Step 7: Check OCR Operation Status.

operation_id = result.headers["Operation-Location"].split("/")[-1]
while True:
    status = client.get_read_result(operation_id)
    if status.status.lower() == OperationStatusCodes.SUCCEEDED:
        break
    elif status.status.lower() == OperationStatusCodes.FAILED:
        print("OCR processing failed.")
        exit(1)
    else:
        time.sleep(5)

Here, the code checks the status of the OCR operation by repeatedly querying the operation ID until the operation is completed successfully or fails. It waits for a few seconds between each query.

Step 8: Extract Text from OCR Result.

extracted_text = ""
for read_result in status.analyze_result.read_results:
    for line in read_result.lines:
        extracted_text += " ".join([word.text for word in line.words]) + "n"

Step 9: Extract Registration Number from Text.

registration_numbers = re.findall(
    r"[A-Z]{2}s?d{1,2}s?[A-Z]{1,3}s?d{1,4}", extracted_text
)
if registration_numbers:
    registration_number = registration_numbers[0].replace(" ", "")
    print("Registration Number:", registration_number)
else:
    print("No registration number found in the extracted text.")

By using regular expressions, this step extracts the registration number from the extracted text. If a registration number is found, it is stored in the registration_number variable. If no registration number is found, a message is printed.

After successfully extracting the data from the image using OCR, our next step is to cross-reference the obtained data with our database, which in this case is an Excel sheet. This allows us to compare the extracted information with the existing records in the Excel database for further processing and analysis.

Step 10: Load Excel Data into a data frame.

excel_file_path=""
data = pd.read_excel(excel_file_path, sheet_name="Sheet1")

Step 11: Check if the Registration Number is Present in Excel.

column_name="column_to_check"
matching_rows = data[data[column_name].str.replace(r'[^a-zA-Z0-9]', '', regex=True).str.lower().str.strip().isin([registration_number.lower().strip()])]
if matching_rows.empty:
    print("The extracted registration number is not present in the Excel file.")
else:
    print("The extracted registration number is present in the Excel file.")

Replace ‘column_to_check’ with the name of the column in which you want to check for data presence.

By following these steps, you can pass the extracted data from Azure OCR to the given_data variable and check its presence in the Excel file using pandas.

In addition to the services described in this article, Skrots provides a wide range of solutions for data extraction, data analysis, and automation. Visit https://skrots.com to learn more about our services and how we can assist you in your data-related tasks.

Thank you for considering Skrots as your data solution provider!

Show More

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button