Microgateway as a data source for Prometheus metrics

Our gateway components can act as a data source for Prometheus metrics for time-series-based real-time events and alert monitoring.

Note, that Grafana supports querying Prometheus metrics e.g. as described in the Airlock Minikube Example.

This section describes the configuration to expose the metrics and gives an overview of the set of Airlock-specific metrics.

Configuring the deployment

Prometheus can scrape the metrics statically from configured Kubernetes resources or retrieve the scrape targets with service discovery. Since Microgateway can be scaled horizontally, a static configuration might be not very practical. The instructions show what must be configured for the Microgateway deployment. For the Prometheus configuration consult their documentation or have a look into the configuration of the Airlock Minikube Example.

The default port for Prometheus metrics on Microgateway containers is 9102.

  1. In the Microgateway deployment configuration, port 9102 must be exposed.
  2. Ensure that Prometheus is configured with service discovery and the annotation in the next step corresponds to the Prometheus configuration.
  3. In addition, annotations for Prometheus must be added, informing Prometheus that this container offers metrics to be scraped:
  4. Show moreShow less
    # deployment.yaml
    apiVersion: apps/v1 
    kind: Deployment 
    metadata: 
      name: microgateway
    spec:
      ... 
      template:
        metadata:
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/port: "9102"
        spec:
          ... 
          containers:
            - name: microgateway
              ...
              ports:
                - name: metrics
                  containerPort: 9102
    

Airlock-specific metrics in Prometheus format

The following table lists Airlock-specific metrics that are exposed i.e. for monitoring licensed throughput and common gateway indicator values. Prometheus supports different metric types such as counter, gauge, histogram, and summary.

Read more about metrics types here https://prometheus.io/docs/concepts/metric_types/.

Low-level metrics, such as CPU and RAM usage, are typically provided by the container platform.

A metric is published when there has been at least one sample point. The reason is that a sample is required before any output can be generated.

Metric name

Metric type

Unit

Description

http_requests_current

Gauge

integer

The number of currently processed requests.

http_requests_duration_seconds_sum

Gauge

integer

The duration of request processing in seconds.

http_requests_duration_seconds_count

Counter

integer

The number of requests that are used for http_requests_duration_seconds_sum.

This counter is also used as the base for the timing statistics for http_requests_duration_seconds.

http_requests_duration_seconds

Histogram

floating point

Timing statistics (percentiles) for request processing durations.

The histogram is calculated in quantiles, as described here https://prometheus.io/docs/practices/histograms/#quantiles.

http_requests_allowed_total

Counter

integer

The number of allowed requests.

http_requests_blocked_total

Counter

integer

The number of blocked requests.

http_requests_rejected_total

Counter

integer

The number of rejected requests.

http_sessions_current

Gauge

integer

The number of currently active sessions.

http_sessions_authenticated_current

Gauge

integer

The number of currently active authenticated sessions.

airlock_workload_ratio

Gauge

floating point

Ratio indicating the workload of the pod/system.

  • Values range
  • 0 = no load
  • 1 = maximum load

The load values can be interpreted as percentage information between 0–100% load.

airlock_throughput

Gauge

integer

Throughput used for licensing.

The value denotes the rate of processed calls in calls/s, averaged over one-minute time windows. Only valid calls (requests) forwarded to protected services are counted. Corresponds to the calculated throughput logged in message with log_id "WR-SG-TIME-200".

airlock_throughput_licensed

Gauge

integer

Licensed aggregate throughput.

Must be larger than the sum of the airlock_throughput outputs of all gateways in the same environment.