Machine Learning Operations and Ethical ML are One and The Same (Part I- For the Non-Technical Types)

8 min readJan 10, 2022

The premise to the title of this post is pretty obvious. If you are an enterprise or organization focused on consciously incorporating Machine Learning Operations into your data science practice, you are simultaneously performing responsible AI. Now that that is said, what I want to accent is performing ML Ops does not mean that having an ML Ops practice is the full expression of all you can do to have responsible AI systems- but what it does tell me is that you are already thinking about ways to make your systems work at their top performance.

Because that is what ethical AI is about at the end of the day, right? There are the flashy news headlines stories with tales of large tech companies bungling their vision based AI systems due to race or ethnicity. There are tales of companies using machine learning to select candidates for hire that result in minorities being eliminated very early in the process. Or failed algorithm enabled iBuying systems in real estate. The list goes on and on. Heated political discussions aside, what we technicians can agree upon is this: each one of these examples shows a failure in the AI system itself- the AI did not do its job right. Which means it was architected and deployed incorrectly.

So while ethics in AI might be considered a sociological or political issue- it is fundamentally a technical flaw that must be fixed in order for AI systems to actually do their jobs right.

What is ML Operations?

I would like to borrow the definition and objectives of the new and budding area of Machine Learning operations from ML-Ops.org to define what ML Ops means to us here. Check it out:

And objectives?

Great. Now if you scroll down on the ML-Ops website, you see a few curious sections. We will come back to these later after we have defined what Ethical/Responsible AI is. But take a mental note here:

Now that we have an understanding of what ML Ops as a field is, let’s define what Responsible/ Ethical AI is to us.

How I define Ethical AI

I draw my definitions of Ethical or Responsible AI (RAI) practices from two documents in the national security space, as well as Google’s Ethical AI principles. Let’s take a look at what each one of them defines Ethical AI to be:

DoD Joint AI Center Published Responsible AI Guidelines

Responsible. DoD personnel will exercise appropriate levels of judgment and care, while remaining responsible for the development, deployment, and use of AI capabilities.
Equitable. The Department will take deliberate steps to minimize unintended bias in AI capabilities.
Traceable. The Department’s AI capabilities will be developed and deployed such that relevant personnel possess an appropriate understanding of the technology, development processes, and operational methods applicable to AI capabilities, including with transparent and auditable methodologies, data sources, and design procedure and documentation.
Reliable. The Department’s AI capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness of such capabilities will be subject to testing and assurance within those defined uses across their entire life-cycles.
Governable. The Department will design and engineer AI capabilities to fulfill their intended functions while possessing the ability to detect and avoid unintended consequences, and the ability to disengage or deactivate deployed systems that demonstrate unintended behavior. [Source]

The Office of the Directorate of National Intelligence Responsible AI (RAI) Guidelines

Google Responsible AI Principles

How ML Ops and RAI Come Together (Non-Technical Musings)

Let’s take a look at all of these RAI concepts from different defense and non-defense organizations together- I have clumped them into similar categories with my musings.

Responsible, Objective and Equitable, Equitable, Respect the Law and Act with Integrity, Informed by Science and Technology

Almost all three frameworks included something along the lines of building responsible systems, all three used the work “equitable” in some way, shape or form. Where data engineers and ML Ops engineers might come into focus here is the “objective” piece- this objective take on designing ML models is what will prevent (in theory) model bias and issues with models doing crazy things like mis-classifying objects in an offensive way. ML Ops engineers and data engineers will also appreciate creating systems informed by science and technology- all of us technicians would like to think this is where the root of all of our work comes from in the first place. If you have an AI that is not rooted in science and technology, then you don’t really have an AI system at all.

Traceable, Identify Multiple Metrics to Assess Training and Monitoring, Transparent and Accountable, When possible, directly examine your raw data, Understand the limitations of your dataset and model

This section should be very relatable to both data engineers and ML Ops engineers. Creating traceable models means you know (as much as possible) everything from data chain of custody all the way to model explainability metrics and insights (more on this in the technical section). All of the above hit the high, important points: creating models that are closely monitored in all phases of the AI cycle (starting with examining the data to be used as training data) all the way to examining model performance and its associated limitations. These are all technical tasks (as well as tasks that someone like an AI Product Manager or PM might oversee) and can all be traced to specific technical solutions that target all of the abovementioned areas (more in the technical section on what types of solutions we could be looking at).

Reliable, Secure and Resilient

What does “Reliable” translate to in technical terms for an ML Ops Engineer or Data Engineer? Reliable means the model will perform with utmost precision, recall, accuracy, AUC, etc etc. The model will produce somewhat predictable results that are not all over the place, spazzing out on corner or edge use cases as an example. These are technical tasks and frankly technical debt that the whole AI team takes on when training and creating a model, all the way down to long term maintenance and monitoring. Resiliency is literally a technical term used to describe features- model resiliency and feature resiliency is now becoming the hot topic of many technical publications.

Security is something we will dive into in the technical post a bit more- but here is another budding field of what I will call “AI Security Engineers” or even “AI system pen-testers”. This is focusing on things like making sure pickling models doesn’t go bad and expose the model to reverse-engineering or potentially tampering. Ensuring that the whole pipeline (from start to finish) cannot be hacked. Ensuring the model endpoint (whether it be API or any other way you serve your model) is secure and cannot be hacked or tampered with (for example, a DDoS attack on a model endpoint would render it inaccessible, or bombarding the endpoint with a bunch of irrelevant inference data may cause the model to skew or just throw up altogether.

Governable, Continue to monitor and update the system after deployment, Test, Test, Test

This section (I feel) is most targeted on the technical debt and tasks a team will take on once a model is trained, tested, and finally deployed. There are a ton of technical packages and tools and products that target this area that I will go over in the next post- but long story short, this is where ML Ops engineers and data engineers really encounter the long term maintenance tasks of making sure a model is deployed properly and continues to fulfill the function for which it was designed. Model monitoring and governance once a model is incorporated into a workflow expands far beyond the technical bits and bytes, however- members of the team such as the PM, product, and AI lead need to stay in tune with the larger business apparatus to ensure the model is still relevant to the mission it was originally assigned. Or that the model is still appropriately scoped for the product in which it is deployed within- whether it is an application or something else. If a model is no longer needed, or scope has changed (even ever so slightly) then it is the job of the AI leadership team to ensure the technical team is alerted, and knows what to do next.

Human-Centered Development and Use, User-Centered Design and Approach

Oddly enough (but not surprisingly) this aspect of RAI was not included or mentioned at all in the DoD AI Ethics Principles published by the JAIC, but was mentioned by the ODNI and of course by the mothership of user-centered design- Google itself. This section is most important in the scoping phase of creating an AI system- ensuring the scope is designed for a human need, with actual users who will benefit from its outputs. I do not think the traditional User-centered design approach initially started by Google entirely does this section justice- someone needs to develop a framework for user-centered design for AI systems, because the scoping is ever so slightly different and the technical underpinnings are somewhat different as we are not talking about a traditional consumer product or full stack application here.

In the next post, I will explore all of these sections in more detail from a technical perspective- diving deep into technical solutions that are available out there, how they function and work, and how all of those technical tasks (normally tackled by ML Ops engineers and data engineers) ultimately lead to Responsible AI Systems. Stay tuned!