RegEx Cheatsheet
Study Reference: learn-regex/translations at master · ziishaned/learn-regex · GitHub
RegEx Online Test Tool: https://regex101.com/
What is RegEx?
A regular expression is a group of characters or symbols which is used to find a specific pattern in a text.
A regex is like a special pattern we look for in a bunch of letters. We use it to do things like changing words in a sentence, checking if a form is filled out correctly, or finding a part of a sentence that follows a certain pattern. Instead of saying “regular expression,” we often just say “regex” or “regexp” because it’s shorter.
1. Basic Matchers
A regular expression is just a pattern of characters that we use to perform a search in a text. For example, the regular expression cat
means: the letter c
, followed by the letter a
, followed by the letter t
.
Regular expressions are normally case-sensitive, so the regular expression Cat
would not match the string cat
.
2. Meta Characters
Meta characters are like the ingredients we use to make regular expressions. They don’t represent themselves, but they have special meanings and are used in a unique way.
2.1 Quantifiers
Meta Characters | Descriptions |
? | Makes the preceding symbol optional. |
E.g., The regular expression used?
is used to match strings that contain either use
or used
. The question mark ?
specifies that the preceding character or group (in this case, d
) is optional, meaning it can occur zero or one time.
Meta Characters | Descriptions |
* | Matches 0 or more repetitions of the preceding symbol. |
E.g., The regular expression ca*t
is used to match strings that start with c
, followed by zero or more occurrences of the character a
, and ending with the character t
. In this regular expression a
can be absent or repeated any number of times.
Meta Characters | Descriptions |
+ | Matches 1 or more repetitions of the preceding symbol. |
E.g., The regular expression ca+t
is used to match strings that start with c
, followed by one or more occurrences of the character a
, and ending with the character t
. It will not match strings that have zero a
character between c
and t
.
Meta Characters | Descriptions |
{n,m} | Braces. Matches at least “n” but not more than “m” repetitions of the preceding symbol. |
E.g., Regular expression ca{6,7}t
will match strings like caaaaaat
(6 a
characters) and caaaaaaat
(7 a
characters). It will not match strings with fewer than 6 or more than 7 a
characters between c
and t
.
It can also be written as ca{n}t
with exact match “n” repetitions of the preceding symbol.
Further it can be implemented like ca{n,}t
with exact match “n” or more occurrences of the character greater than “n” repetitions of the preceding symbol.
2.2 Group
Meta Characters | Descriptions |
(xyz) | Character group. Matches the characters xyz in that exact order. |
E.g., Regular expression (ca){2}t
is used to match strings that have the pattern ca
repeated exactly two times consecutively.
2.3 Alternation (OR Operator)
Meta Characters | Descriptions |
| | Alternation. Matches either the characters before or the characters after the symbol. |
E.g., The regular expression a (cat|dog)
is used to match strings that contain the letter a
followed by a space and then either the word cat
or the word dog
.
2.4 Character Classes
Character classes are also called Character Sets. Square brackets []
are used to specify character sets. Use a hyphen inside a character set to specify the characters’ range.
The order of the character range inside the square brackets doesn’t matter.
E.g., [Tt]he
means: an uppercase T
or lowercase t
, followed by the letter h
, followed by the letter e
.
The following [a-z]
matches a single character in the range between a and z in lower case.
2.4.1 Neglated Character Classes
In general, the caret symbol ^
represents the start of the string, but when it is typed after the opening square bracket it negates the character class.
E.g., the expression [^c]ar
means any character except c
followed by ar
.In other words, it matches any word that contains with ar but doesn’t start with the letter c.
3. Shorthand Character Sets
Shorthand | Description |
. | Any character except new line |
\w | Matches alphanumeric characters: [a-zA-Z0-9_] |
\W | Matches non-alphanumeric characters: [^\w] |
\d | Matches digits: [0-9] |
\D | Matches non-digits: [^\d] |
\s | Matches whitespace characters: [\t\n\f\r\p{Z}] |
\S | Matches non-whitespace characters: [^\s] |