Unsupervised machine learning explained

How does Airlock Anomaly Shield calculate anomaly values in the first place?

Since there is no general ruling on what a common value of e.g. an acceptable TimingCluster is, the only way to get an answer is to compare old values with the current values of a session. This is exactly what the unsupervised machine learning models inside Airlock Anomaly Shield are doing.

Example – Generation of TimingCluster indicator

  1. To clarify the way how unsupervised machine learning works, we give an example using the single indicator TimingCluster:
  2. The collected timing statistics of Airlock Anomaly Shield contain information about the distributions of request intervals as timestamp deltas within sessions that have occurred in the past.
  3. The Anomaly Shield algorithms are trained to identify similar timestamp delta distributions (using a k-means clustering algorithm).
  4. For a new session, the timestamp delta distribution is classified using the trained model. The resulting anomaly indicator value depends on the size of the cluster into which the new session is classified:
    • if the session is classified into a very small cluster (= rare timing behavior), the indicator value is close to 1.0.
    • if the session is classified into a larger cluster (= typical timing behavior), the indicator value is closer to 0.0.
  5. If the indicator value is above the configured threshold, the TimingCluster indicator becomes active. Meaning it results in an active bit in the anomaly indicator pattern.
  6. The pre-configured thresholds are predefined to detect common session anomalies.

    The thresholds may be tuned if the predefined settings do not meet your requirements.