A narrative and helpful tips for using MLOps

Many organizations in recent years have been taking advantage of the power of machine learning (ML). From natural language processing (NLP) to image recognition, fields from advertising to zoology have benefitted from the still-emerging power of ML. Humans are getting better at making sense from a glut of data. Recent technical advances have improved our ability to clean, engineer, and label our data. New tools have also helped us design and train ML models.

While these tools have helped data engineers and data scientists accelerate their work, most tools only act as components of the machine learning pipeline. Data scientists…


A general evaluation of ALPR systems for commercial models.

Many license plates. Photo by Susan Kirsch on Unsplash

Background

Anno.Ai’s data science team often conducts benchmark assessments of commercial and open-source AI/ML models to gauge their fit for our customers’ use cases (for example, see our assessments of handwritten text recognition (HTR) models and named entity recognition (NER) models here and here). Recently we looked at commercial providers of automatic license plate recognition (ALPR) models. Many of our customers have large bodies of unstructured data that include images and videos that are used for post-event analysis. Our customers often require the ability to triage their data quickly to identify objects…


An evaluation of Named Entity Recognition models for commercial NLP offerings.

Photo by Max Chen on Unsplash

Background

As part of our series on AI/ML model evaluations, the Anno.Ai data science team delved into the world of Natural Language Processing (NLP). Many of our customers have NLP needs, so we decided to explore a variety of online and offline NLP libraries and services. For this task our data science team looked into Named Entity Recognition (NER) and this article is specifically focused on online vendors.

What is NER?

Given a collection of documents or other unstructured text, it is useful to be able to identify and extract information that falls…


As previously published by Amina Al Sherif

As people occasionally do, I was asked to attend a conference that involved a lot of listening and not much speaking, which is generally what happens when your speaker list consists of more than thirty individuals on a Zoom call.

This Medium post is where you can get more literature behind all of the types of zoom participants we all know so well….

Despite my lack of ability to speak at said conference, I did think the questions posed were worthwhile. As I settle into my new role as Chief Data Ethicist at Anno.Ai, I realize now more than ever that data ethics is extremely closely tied with subjects on AI security.

I will present my…


A review of open-source automated polygon segmentation approaches

Image created by Anno.ai

Background

Labeling images in preparation for neural network training can be time-consuming and tedious, especially when working with polygons. The most common tools available require the user to place multiple points around the object they wish to segment. These tools either draw straight lines from point to point or try their best to snap the line around the edges of the object. Sometimes these tools require 10 or more points to create a decent segmentation of the object.

The Anno.Ai data science team regularly works with labeled data and was interested in finding a faster approach to segmenting objects in images…


… and a comparison with cloud provider translation services

Photo by Daniel Romero on Unsplash

Background

As we discussed in our previous post, the Anno.Ai data science team has continued evaluating machine learning model providers by testing machine translation offerings. In Part 1, we compared the Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure APIs for translating Arabic, Chinese, Persian, and Russian into English.

While the commercial cloud services provide a great option for online use cases, some of our use cases require running models in an offline environment and/or the flexibility to re-train and tune these models to more specific data environments. For…


How to say good-bye in several languages. Photo by JACQUELINE BRANDWAYN on Unsplash

Anno.Ai’s data science team has continued evaluating online machine learning model providers by testing machine translation offerings. This evaluation follows our previous benchmarking studies of handwriting recognition, named entity recognition, and automatic license plate recognition providers.

Background and Use Case

For this exercise, we envisioned a customer who needs to quickly grasp the meaning, tone, and intent of short text blocks. The type of text might range from a formal document to more colloquial communications containing idioms, abbreviations, or social media-style hashtags. This customer may not need publication-ready translated output, but understanding nuance and context from a variety of speaker/writer styles is key.

With…


A quick start guide to version control for machine learning data

Photo by Franki Chamaki on Unsplash

Background

As part of a larger effort to test and evaluate different MLOps frameworks, the data science team at Anno.Ai recently tested out DVC to improve integration between our model repos on GitHub and our data and model storage on Amazon S3. In this article, we provide a quick guide to getting set up with DVC and some tips we learned along the way.

What is DVC?

DVC (Data Version Control) is an open-source application for machine learning project version control — think Git for data. …


A review of our favorite online ML learning resources as well as some DIY activities you can try at home

As with so many other families and companies during the COVID-19 pandemic, Anno.Ai team members have had to adjust to new patterns of working, child care, and schooling for our children. One of the many ways we’ve sought to support each other and our Anno.Ai family during this time is by sharing educational activities and resources with each other.

Artificial intelligence for Kids (by Tinkler Toddlers, 2018) Source.

As an AI/ML company, many of these resources and ideas have included ways to involve our kids in fun machine learning and computer programming activities. Here, we wanted to share a few of our favorites as well as some DIY activities…


A benchmarking comparison between models provided by Google, Azure, AWS as well as open source models (Tesseract, SimpleHTR, Kraken, OrigamiNet, tf2-crnn, and CTC Word Beam Search)

As we discussed in our previous post, Handwritten Text Recognition (HTR) involves the conversion of handwritten text into machine-encoded text, which can be challenging due to the variation in handwriting styles between different people. After an initial model review and down-selection, we evaluated APIs from three of the major cloud providers (Google, AWS, and Microsoft) and two open source models (Tesseract and SimpleHTR) for performance in processing handwritten text. In this post, we share the results from our evaluation and discuss the best HTR models to use for different use cases.

HTR Evaluation Results

Created by Anno.Ai

The Google Cloud Vision API was…

Anno.Ai

Operationalizing applied machine learning for the mission.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store