Pattern recognition — Identifying phishing emails based on content or sender info, identifying malware, etc.
Anomaly detection — Spotting unusual activity, data, or processes (e.g., fraud detection for online banking or gambling).
Natural language processing (NLP) — Converting unstructured text such as a webpage into structured intelligence.
Predictive analytics — Processing data and identifying patterns in order to make predictions and identify outliers.
Application of AI/ML techniques based on trends and insights to develop a probabilistic/stochastic (time series) model to predict adversary behavior.
Selection of Algorithms
- Ensemble of unsupervised and supervised algorithms to improve robustness of results
- Observed general agreement between algorithms on the results
Dimension Reduction
- Used Principle Component Analysis (PCA) to flatten the dimensions in the output to improve algorithm speed and performance
- Enable easier viewing of the outliers
Fitness Values
- Fitness value is a single number that summarizes the degree to which a record is similar to other records in the dataset
- Set a fitness value range (-ve #, +ve #) to separate inliers and outliers – outliers have more negative fitness values
Training the Algorithms
- Assign a Threshold Value (TV) to separate inliers (FV >TV).
- Separated the rows in inliers/outliers using the unsupervised FV. These labeled rows are used to train the Supervised ML Algorithms