Employing Machine Learning

Pattern recognition — Identifying phishing emails based on content or sender info, identifying malware, etc.
Anomaly detection — Spotting unusual activity, data, or processes (e.g., fraud detection for online banking or gambling).
Natural language processing (NLP) — Converting unstructured text such as a webpage into structured intelligence.
Predictive analytics — Processing data and identifying patterns in order to make predictions and identify outliers.

Application of AI/ML techniques based on trends and insights to develop a probabilistic/stochastic (time series) model to predict adversary behavior.

Selection of Algorithms

  • Ensemble of unsupervised and supervised algorithms to improve robustness of results
  • Observed general agreement between algorithms on the results

Dimension Reduction

  • Used Principle Component Analysis (PCA) to flatten the dimensions in the output to improve algorithm speed and performance
  • Enable easier viewing of the outliers

Fitness Values

  • Fitness value is a single number that summarizes the degree to which a record is similar to other records in the dataset
  • Set a fitness value range (-ve #, +ve #) to separate inliers and outliers – outliers have more negative fitness values

Training the Algorithms

  • Assign a Threshold Value (TV) to separate inliers (FV >TV).
  • Separated the rows in inliers/outliers using the unsupervised FV. These labeled rows are used to train the Supervised ML Algorithms