Scalable Machine Learning for Packet Capture Data with Kubeflow

The lovely rainbows of PCAP on Wireshark…

Framing the Use Case

Our prototype machine learning (ML) output identifies a threshold of root mean squared error (RMSE) derived from autoencoder reconstruction to identify abnormal traffic for the chosen feature set.

So what is Kubeflow?

Before…
After!

Overview of our Kubeflow Implementation

Assumptions

Instance Prerequisites (The following components were used to support the deployment and configuration of Kubeflow on AWS).

Overview of our Machine Learning Approach to PCAP Baselining and Anomaly Detection

During training, autoencoder computes the difference (RMSE) between the original data and it’s learned representation. Source.
During prediction, trained autoencoder computes the difference (RMSE) between test data & it’s reconstruction and compares it to the RMSE values obtained during training to classify test data as normal or abnormal.

Why Kubeflow works for this Use Case

Data Ingestion & Parsing:

Data Engineering:

Model Building & Training

Model Operations

Operationalizing applied machine learning for the mission.