Our gateway components can act as a data source for Prometheus metrics for time-series-based real-time events and alert monitoring.
Note, that Grafana supports querying Prometheus metrics e.g. as described in the Airlock Minikube Example.
This section describes the configuration to expose the metrics and gives an overview of the set of Airlock-specific metrics.
Configuring the deployment
Prometheus can scrape the metrics statically from configured Kubernetes resources or retrieve the scrape targets with service discovery. Since Microgateway can be scaled horizontally, a static configuration might be not very practical. The instructions show what must be configured for the Microgateway deployment. For the Prometheus configuration consult their documentation or have a look into the configuration of the Airlock Minikube Example.
The default port for Prometheus metrics on Microgateway containers is 9102.
- 1.In the Microgateway deployment configuration, port 9102 must be exposed.
- 2.Ensure that Prometheus is configured with service discovery and the annotation in the next step corresponds to the Prometheus configuration.
- 3.In addition, annotations for Prometheus must be added, informing Prometheus that this container offers metrics to be scraped:
# deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: microgateway spec: ... template: metadata: annotations: prometheus.io/scrape: "true" prometheus.io/port: "9102" spec: ... containers: - name: microgateway ... ports: - name: metrics containerPort: 9102
Airlock-specific metrics in Prometheus format
The following table lists Airlock-specific metrics that are exposed i.e. for monitoring licensed throughput and common gateway indicator values. Prometheus supports different metric types such as counter, gauge, histogram, and summary.
Read more about metrics types here https://prometheus.io/docs/concepts/metric_types/.
Low-level metrics, such as CPU and RAM usage, are typically provided by the container platform.
A metric is published when there has been at least one sample point. The reason is that a sample is required before any output can be generated.
- Note the following:
- ●Statistically relevant statements always require a sufficient basis such as a certain number of requests, e.g. if http_requests_duration_seconds_count >1000.
- ●Some metrics provide mapping names as metadata in the form of labels that can be used to assign metrics per mapping.
- ●Read more here about labels here: https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
- ●Read more about how to query metadata here: https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metadata
Metric name | Metric type | Unit | Description |
http_requests_current | Gauge | integer | The number of currently processed requests. |
http_requests_duration_seconds_sum | Gauge | integer | The duration of request processing in seconds. |
http_requests_duration_seconds_count | Counter | integer | The number of requests that are used for http_requests_duration_seconds_sum. This counter is also used as the base for the timing statistics for http_requests_duration_seconds. |
http_requests_duration_seconds | Histogram | floating point | Timing statistics (percentiles) for request processing durations. The histogram is calculated in quantiles, as described here https://prometheus.io/docs/practices/histograms/#quantiles. |
http_requests_allowed_total | Counter | integer | The number of allowed requests. |
http_requests_blocked_total | Counter | integer | The number of blocked requests. |
http_requests_rejected_total | Counter | integer | The number of rejected requests. |
http_sessions_current | Gauge | integer | The number of currently active sessions. |
http_sessions_authenticated_current | Gauge | integer | The number of currently active authenticated sessions. |
airlock_workload_ratio | Gauge | floating point | Ratio indicating the workload of the pod/system.
The load values can be interpreted as percentage information between 0–100% load. |
airlock_throughput | Gauge | integer | Throughput used for licensing. The value denotes the rate of processed calls in calls/s, averaged over one-minute time windows. Only valid calls (requests) forwarded to protected services are counted. Corresponds to the calculated throughput logged in message with log_id "WR-SG-TIME-200". |
airlock_throughput_licensed | Gauge | integer | Licensed aggregate throughput. Must be larger than the sum of the airlock_throughput outputs of all gateways in the same environment. |