Employing Machine Learning

Pattern recognition — Identifying phishing emails based on content or sender info, identifying malware, etc.
Anomaly detection — Spotting unusual activity, data, or processes (e.g., fraud detection for online banking or gambling).
Natural language processing (NLP) — Converting unstructured text such as a webpage into structured intelligence.
Predictive analytics — Processing data and identifying patterns in order to make predictions and identify outliers.

Application of AI/ML techniques based on trends and insights to develop a probabilistic/stochastic (time series) model to predict adversary behavior.

Selection of Algorithms

Ensemble of unsupervised and supervised algorithms to improve robustness of results
Observed general agreement between algorithms on the results

Dimension Reduction

Used Principle Component Analysis (PCA) to flatten the dimensions in the output to improve algorithm speed and performance
Enable easier viewing of the outliers

Fitness Values

Fitness value is a single number that summarizes the degree to which a record is similar to other records in the dataset
Set a fitness value range (-ve #, +ve #) to separate inliers and outliers – outliers have more negative fitness values

Training the Algorithms

Assign a Threshold Value (TV) to separate inliers (FV >TV).
Separated the rows in inliers/outliers using the unsupervised FV. These labeled rows are used to train the Supervised ML Algorithms