Model training on ColdDB data
BookShop example – Anomaly training based on ColdDB data collection

This brief instruction based on our BookShop example explains how session metrics can be collected and fed into Airlock Anomaly Shield to train its machine learning algorithms. It covers the basic configuration and the optional case of data merging training data from different ColdDBs.

Step 1 – Create an application and enable data collection

  • 1.
    Create an application named BookShop that should be used for data collection. For step-by-step instruction, see Configuration of Airlock Anomaly Shield applications.
  • 2.
    Enable training data collection.
  • 3.
    Choose a mapping and set the BookShop application.
  • New requests for the BookShop application will now be collected in a ColdDB.

Note that only sessions with a minimum number of requests are collected. Very short sessions, e.g. with only a single request, are not collected because they have little to no value as training data.

Step 2 – Data merging from multiple ColdDBs (optional)

In a setup with multiple Airlock Gateway instances, i.e. a failover-cluster, the collected data may be cluttered over different ColdDBs.

  • In this case, the data should be merged into a single ColdDB for a large single data collection with the following steps:
  • 1.
    Copy all cold.db data to a single node. In our example, we copy the remote partner-cold.db into the same folder as our local cold.db:
  • copy
    cd /var/tmp/ 
    scp root@${PARTNER}:/var/airlock/ml-service/cold.db ./partner-cold.db
  • 2.
    Merge both ColdDBs into one ColdDB:
  • copy
    cd /opt/airlock/ml-service/bin  
    ./airlock-ml-colddb-tool --cold-db /var/tmp/partner-cold.db --other-cold-db /var/airlock/ml-service/cold.db copy 
    
  • 3.
    Cleanup the folder and delete the copied db.
  • copy
    rm /var/tmp/partner-cold.db
  • Now we have a single ColdDB containing all available session metrics.

Step 3 – Train the Anomaly Shield ML model

  • Once we have collected a few thousand session data it is time to start training the Airlock Anomaly Shield service:
  • 1.
    Start the training of the machine learning models. In our example we have parameterized the application and time range:
  • copy
    cd /opt/airlock/ml-service/bin 
    ./airlock-ml-trainer --cold-db "/var/airlock/ml-service/cold.db"  --model-dir "/var/airlock/ml-service/models/" --application "BookShop" --start "2021-06-01" --end "2021-07-01"
  • 2.
    After training, the airlock-ml-service needs to be restarted:
  • copy
    systemctl restart airlock-ml-service
  • Now the models are trained, and Airlock Anomaly Shield can be configured to evaluate new requests for the BookShop application.

Note that in our example, we've used the default locations for cold.db and model-dir.

  • Change the paths accordingly if you plan to:
  • Use a ColdDB from a specific Airlock Gateway instance.
  • If the active models should not be overwritten, e.g. if the models should only be used for analytics/tuning tests.

Step 4 – Distribute trained models between Gateway instances (optional)

In a setup with multiple Airlock Gateway instances, i.e. a failover-cluster, the trained models must be distributed to the other instances.

  • 1.
    Copy models from the trained Gateway instance to a partner instance.
  • copy
    scp -r /var/airlock/ml-service/models/* root@${PARTNER}:/var/airlock/ml-service/models/
  • 2.
    On the partner instance restart the ml-service so the new models are being used.
  • copy
    systemctl restart airlock-ml-service
  • Now, both Airlock Gateway instances use the same training data.