Skip to content

Regex Cheatsheet

The regular expression engine starts as soon as it can, grabs as much as it can, then tries to finish as soon as it can, while taking the first decision available to it.


Anchors

char - - usage
^ Start of string, or start of line in multi-line pattern
\A Start of string
$ End of string, or end of line in multi-line pattern
\Z End of string
\b Word boundary
\B Not word boundary
\< Start of word
\> End of word


Character Classes

char - - usage
\c Control character
\s White space (space or tab)
\S Not white space
\d Digit, same as [0-9]
\D Not digit, same as [^0-9]
\w Alphanumeric (letters, numbers, underscore)
\W Not alphanumeric
\x Hexade cimal digit
\O Octal digit


POSIX Classes

char - - usage
[:upper:] Upper case letters
[:lower:] Lower case letters
[:alpha:] All letters
[:alnum:] Digits and letters
[:digit:] Digits
[:xdigit:] Hexade cimal digits
[:punct:] Punctuation
[:blank:] Space and tab
[:space:] Blank characters
[:cntrl:] Control characters
[:graph:] Printed characters
[:print:] Printed characters and spaces
[:word:] Digits, letters and underscore


Assertions

char - - usage
?= Lookahead assertion
?! Negative lookahead
?<= Lookbehind assertion
?!= or ?<!-- Negative lookbehind
?--> Once-only Subexpression
?() Condition [if then]
?()| Condition [if then else]
?# Comment


Quantifiers

char - - usage
* 0 or more
+ 1 or more
? 0 or 1
{3} Exactly 3
{3,} 3 or more
{,5} at most 5
{3,5} 3, 4 or 5
Tip Add a ? to a quantifier to make it ungreedy.


Escape Sequences

char - - usage
\ Escape following character
\Q Begin literal sequence
\E End literal sequence
Tip Within a literal sequence, no need to escape Metacharacters


Metacharacters in regex need to be escaped in order to let regex recognize and match them.

Common Metacharacters: ^ [ . $ { * ( \ + ) | ? < >


Special Characters

char - - usage
\n New line
\r Carriage return
\t Tab
\v Vertical tab
\f Form feed
\xxx Octal character xxx
\xhh Hex character hh


To match above special characters, need to escape the backslash like this \\n in a regex expression.


Groups and Ranges

char - - usage
. Any character except new line (\n)
(a|b) a or b
(...) Group
(?:...) Passive (non-capturing) group
[abc] Range (a or b or c)
[^abc] Not (a or b or c)
[a-q] Lower case letter from a to q
[A-Q] Upper case letter from A to Q
[0-7] Digit from 0 to 7
\x Group/subpattern number "x"
Tip Ranges are inclusive


Pattern Modifiers

char - - usage
g Global match
i * Case-insensitive
m * Multiple lines
s * Treat string as single line
x * Allow comments and whitespace in pattern
e * Evaluate replacement
U * Ungreedy pattern
Tip starred (*) are Perl-compatible Regular Expressions (PCRE) modifiers
https://www.pcre.org/original/doc/html/index.html


String Replacement

char - - usage
$n nth non-passive group
$2 "xyz" in /^(abc(xyz))$/
$1 "xyz" in /^(?:abc)(xyz)$/
$` Before matched string
$' After matched string
$+ Last matched string
$& Entire matched string
Tip Some regex implem ent ations use '\' instead of '$' (i.e. regex used by Splunk)