Selection and parametrization of hash functions

Affects product

Airlock IAM

Configuration

Choosing the right hash algorithms and especially tuning the functions to fit the entropy of the data, the security needs, as well as the performance requirements, are important parts in the configuration of an Airlock IAM system.

Secrets and types of protection requirements

Secrets that have to be checked by a server should rather be stored as salted hash values than as plaintext. There are other approaches like storing secrets or hashes in HSMs or distributed hash databases, but they are not the subject of this section.

There are two goals that hashing of secrets can achieve:
In recent years, numerous databases containing hashed password values have been stolen. In some cases, the attackers inverted all hashes and published a database of plaintext secrets. A good hashing algorithm acts as a last line of defense to hinder access to the underlying secrets.
An authorized human who has to deal with the stored values, e.g. an administrator who operates directly on DB records, should not see the passwords in plaintext so he or she cannot remember them easily.

Hash Functions

A hash function maps strings of arbitrary size to strings of a fixed size. We are interested in so-called cryptographic hash functions, which are designed to be one-way. This refers to the property that they are infeasible to invert. This means it is not possible to find some input that leads to a given output except trying all inputs. Ideally, the function should behave like a random function which, in particular, implies that it is difficult to find collisions.

We distinguish between three classes of these functions:
broken – for a given hash value an attacker can construct another input value that leads to the very same hash value (hash collision). MD5 belongs to that class.
fast – functions that are constructed for efficiency in time and memory. They are not broken in the sense that it would be possible to willingly construct a hash collision. SHA-256 is an example.
costly – these functions are designed to use relevant effort in CPU time and memory for computation. Scrypt is a representative, it was specifically designed for password hashing.

Goal (2) is fulfilled by any hashing function. For further evaluation, we concentrate on goal (1).

The entropy of the Secret

If the secret to protect has a lot of entropy so that it is infeasible even to list all probable values, then the quantity of possibilities already protects against attacks of type (1). This allows efficient automated handling of technical keys, e.g. OAuth or OIDC. Fast hash algorithms are perfect for this scenario.

If the secret has too little entropy, then no hashing method will protect that specific secret against an inversion attack (1). An attacker can simply try all possible inputs. This means explicitly that there is no protection for a badly chosen password.

We now consider the secrets of low to intermediate entropy. These are typically passwords and alike. Attacks of type (1) where all hashes are inverted and published are particularly severe, as users tend to use the same passwords for multiple accounts. In order to protect against such attacks, a costly hash function has to be tuned, so that it requires as much computation effort as is acceptable for the user and the service provider while making attacks as costly as possible. The authors of Scrypt propose that for password hashing, the parameters should be chosen in a way that a hash calculation takes approximately 100ms in a single CPU core [1]. Airlock IAM uses the recommended Scrypt parameters by default. Depending on the application scenario, a higher or lower effort may be acceptable/reasonable.

Recommendations

Recommended hashing functions for specific scenarios
SHA-256 for OAuth, OIDC, and matrix cards
Scrypt for passwords, IAK letters, and secret questions,

Matrix cards typically have low entropy. The inversion of a specific value is feasible - independent of the hashing algorithm. For this reason a service provider has to lock all matrix cards if the hashed values are stolen. Since matrix cards are not reused for other services there is no benefit in inverting and publishing the whole database of secrets.