Regular Expressions - Basics page

Rewrite variables

Rewrite variables can be used in regular expression patterns to define rewrite operations that are independent of the specific mapping strings..

Regex syntax

Airlock Gateway uses Perl Compatible Regular Expressions (PCRE).

Basic operators

ab|xy           Alternation, either 'ab' or 'xy'
(str)           The string 'str' as a logical and captured group
(?:str)         The string 'str' as a logical uncaptured group
a?              The letter 'a' once or not at all
a*              The letter 'a' in unrestricted quantity (including zero)
a+              The letter 'a' at least once
[abc]           Character class: a, b or c
[^abc]          Negative character class: any character except a, b or c
[a-c]           Range: a, b or c
^               The beginning of the string
$               The end of the string

Characters

The backslash character '\' is used for escaping.
An ASCII characters typically gets a special meaning if it is preceded by a backslash '\'.

.               Any character (including CR or LF)
\x{hhh..}       Character with unicode codepoint U+hhh.. (1 to 6 hex digits)
\n              Line feed character (U+000A)
\r              Carriage return character (U+000D)
\t              Tab character (U+0009)

Airlock Gateway does not allow to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language.

Newlines are treated as ordinary characters. They do not have any special meaning in the processed string.

Escaping

A non-ASCII character loses its special meaning if it is preceded by a backslash.
A left parenthesis '(' is the start of a group. If it is escaped '\(', it matches just a left parenthesis.

\\              the backslash character
\?              Escaped character (for any non-alphanumeric character)
\Q .. \E        Literal-text span: treat enclosed characters as literal
                   until the first appearance of \E (no escaping possible)

A backslash may be used prior to any non-alphabetic character regardless of whether the character has a special meaning in that context or not.
Escaping is normally needed for these characters: [({.*?+^$\|
Depending on the context right parenthesis also have to escaped: ])
In brackets escaping is only needed for the right bracket character (but others are allowed as well): ]

Generic characters types

ASCII character types

\d              Any ASCII decimal digit - equals [0123456789]
\D              Any character that is not an ASCII decimal digit
\s              Any ASCII white space character - equals to ' ', HT, LF, FF, CR
\S              Any character that is not an ASCII whitespace character
\w              Any ASCII "word" character [a-zA-Z0-9_]
\W              Any "non-word" character

non-ASCII character types

\h              Any horizontal white space character (including non-ASCII U+2000, U+00a0, U+180e, ...)
\H              Any character that is not a horizontal whitespace character
\v              Any vertical white space character (including non-ASCII U+2028, U+0085)
\V              Any character that is not a vertical white space character

Comments

(?#...)         comment (not nestable)

Simple examples

Pattern

Matches

https?

'http' or 'https'

(abc|xyz)

'abc' or 'xyz'. The found variant can be referenced as $1 in rewrite rules (capturing)

(?:this|that)

'this' or 'that'. The value is not captured and cannot be referenced in rewrite rules

^\d+$

Any number of digits (at least one) - but just digits.

\d+

Any string containing at least one digit. This could be achieved also by just using \d

[\$]

Just the dollar sign - escaping is supported in bracket expressions

[\\\$]

Backslash or dollar sign

[^\t]

Any character but a tabulator

\x{a0}

The non-breaking-space character

More documentation