Unsupervised machine learning
How Airlock Anomaly Shield calculates anomaly indicator values
Because there is no universal threshold for what is “normal” (e.g., for the TimingCluster indicator), Airlock Anomaly Shield uses unsupervised learning to establish baselines from previously observed sessions. Each session’s current measurements are compared against these baselines, and anomaly indicator values are derived from the degree of deviation.
Example: TimingCluster indicator
The following example illustrates how unsupervised learning is applied using the TimingCluster indicator:
- The collected timing statistics of Airlock Anomaly Shield contain information about the distributions of request intervals as timestamp deltas within sessions that have occurred in the past.
- The Anomaly Shield algorithms are trained to identify similar timestamp delta distributions (using a k-means clustering algorithm).
- For a new session, the timestamp delta distribution is classified using the trained model. The resulting anomaly indicator value depends on the size of the cluster into which the new session is classified:
- if the session is classified into a very small cluster (= rare timing behavior), the indicator value is close to 1.0.
- if the session is classified into a larger cluster (= typical timing behavior), the indicator value is closer to 0.0.
- If the indicator value is above the configured threshold, the
TimingClusterindicator becomes active. Meaning it results in an active bit in the anomaly indicator pattern. The pre-configured thresholds are predefined to detect common session anomalies.