Evaluating Automatic License Plate Recognition (ALPR) Systems

A general evaluation of ALPR systems for commercial models.

Many license plates. Photo by Susan Kirsch on Unsplash


Anno.Ai’s data science team often conducts benchmark assessments of commercial and open-source AI/ML models to gauge their fit for our customers’ use cases (for example, see our assessments of handwritten text recognition (HTR) models and named entity recognition (NER) models here and here). Recently we looked at commercial providers of automatic license plate recognition (ALPR) models. Many of our customers have large bodies of unstructured data that include images and videos that are used for post-event analysis. Our customers often require the ability to triage their data quickly to identify objects of interest, some of which include vehicles and license plates. They may also need to search their data quickly to find a given plate number, perhaps using only partial characters.

Overview of ALPR

ALPR is a pipeline of several different tasks: detecting the vehicle and plate, capturing the plate’s text, segmenting the text into characters, and reading the characters. The results are highly dependent on camera placement, camera resolution, target distance from camera, target angle from camera, and various environmental factors. In most use cases such as parking lot monitoring or highway toll collection, ALPR systems operate on dedicated cameras placed specifically to capture and read license plates. In these use cases, operators can choose the camera specifications needed to optimize ALPR performance; however, our customers often require the ability to run ALPR against media that is collected from unconstrained environments where they do not have physical control or access.

Source: Wikimedia Commons
Source: Wikimedia Commons; copyright 2006, Achim Raschka

Commercial Models

We tested several commercial vendor models on their ability to detect and read license plates from in-the-wild images and videos. We chose vendors to test based on the following criteria:

  • Ease of installation
  • Support across different operating systems
  • Wide adoption and ongoing support/development
  • Licensing
  • Compute requirements
  • Retraining options and process


Because our customers have such diverse use cases, we collected data from nearly 35 countries spanning North and South America, Europe, Africa, the Middle East, South Asia, and East Asia. We conducted two rounds of testing: one using still images of relatively “easy” plates, meaning that the plates were visible, readable, in favorable lighting conditions, and not occluded; and one using relatively more difficult dash-cam or body-cam footage, in which the plates were more blurry, distant, skewed, or otherwise difficult to read. While most of the countries we tested used only Latin characters on their plates, Anno also tested the vendors’ performance on non-Latin characters.

Photo by Tehreek Dawat e Faqr on Flickr, CC BY 2.0
Source: pxfuel

Evaluation Metrics

Although some of the vendors that we tested offer the option of running ALPR on video, we opted to test all of the data as still images or video frames for precision in calculating accuracy metrics. We rated the models’ performance based on F1 scores of the number of plates correctly detected in each image. The F1 score is a statistical metric of accuracy with values ranging from 0 to 1. Scores closer to 1 represent higher accuracy. We also recorded observations of the accuracy of the plate character strings each model returned, but focused the metrics solely on plates detected.


Detecting vehicle characteristics changes results

Some of the vendors offer an option to toggle on or off the ability to detect vehicle characteristics, while other vendors provide the characteristics automatically. In some cases, setting the option to detect vehicles returns all cars in the image with readable plates, while turning off this option returns only one car/plate, even if there are multiple. In the below image, setting the option to detect vehicle characteristics returns the four cars whose plates are detectable. Turning off this option returns only the car on the far right in the foreground.

Detecting vehicle characteristics and license plates. Source: https://www.goodfreephotos.com

Country-specific models offer better string-matching results

Some vendors provide lists of countries for which their models were trained. These typically include only countries using Latin characters on their plates. Model training specific to a country’s plate size, format, and font style appeared to yield more accurate or more complete results in reading the character strings. If a plate was partially obscured, the models often returned placeholder characters to ensure the plate string matched the standard number of characters for a given country.

Non-Latin characters are an obstacle for all vendors

None of the vendors tested offered support for non-Latin characters. The typical response was to return a similar-looking Latin character, if possible; however, in many cases the models returned a question mark or other symbol as a placeholder. In South Korean plates, for example, the models return the numerals successfully, but replace the Korean characters in the middle of the plate with a question mark.

Fire hydrant and car with non latin characters. Photo by Chaewon Lee on Unsplash

Non-standard plate configurations confound the models

In some countries, cars may carry multiple plates or have multi-line plates, both of which produced less accurate results. In the case of multi-line plates, the models reliably detected the plates, but typically returned only the bottom string of characters. However, if a car had more than one plate, even if the model correctly detected a vehicle, it did not return a plate string.

Car with multiple plates. Source: Wikipedia

What’s Next

Anno.Ai’s evaluation of ALPR systems is ongoing as we continue to test different types of data representing a variety of environmental conditions, such as low-light and adverse weather scenarios. In addition to determining the type and intensity of conditions that cause performance to degrade, we are also analyzing the specifications of images/videos that produce the best results, such as resolution, dimensions, camera type, etc.

Be sure to follow us on Twitter and LinkedIn!

Operationalizing applied machine learning for the mission.