RegEx Cheatsheet

2 years ago

Ruian Ding

4 minutes

Study Reference: learn-regex/translations at master · ziishaned/learn-regex · GitHub
RegEx Online Test Tool: https://regex101.com/

What is RegEx?

A regular expression is a group of characters or symbols which is used to find a specific pattern in a text.

A regex is like a special pattern we look for in a bunch of letters. We use it to do things like changing words in a sentence, checking if a form is filled out correctly, or finding a part of a sentence that follows a certain pattern. Instead of saying “regular expression,” we often just say “regex” or “regexp” because it’s shorter.

1. Basic Matchers

A regular expression is just a pattern of characters that we use to perform a search in a text. For example, the regular expression cat means: the letter c, followed by the letter a, followed by the letter t.

Regular expressions are normally case-sensitive, so the regular expression Cat would not match the string cat.

2. Meta Characters

Meta characters are like the ingredients we use to make regular expressions. They don’t represent themselves, but they have special meanings and are used in a unique way.

2.1 Quantifiers

Meta Characters	Descriptions
?	Makes the preceding symbol optional.

E.g., The regular expression used? is used to match strings that contain either use or used. The question mark ? specifies that the preceding character or group (in this case, d) is optional, meaning it can occur zero or one time.

Meta Characters	Descriptions
*	Matches 0 or more repetitions of the preceding symbol.

E.g., The regular expression ca*t is used to match strings that start with c, followed by zero or more occurrences of the character a, and ending with the character t. In this regular expression a can be absent or repeated any number of times.

Meta Characters	Descriptions
+	Matches 1 or more repetitions of the preceding symbol.

E.g., The regular expression ca+t is used to match strings that start with c, followed by one or more occurrences of the character a, and ending with the character t. It will not match strings that have zero a character between c and t.

Meta Characters	Descriptions
{n,m}	Braces. Matches at least “n” but not more than “m” repetitions of the preceding symbol.

E.g., Regular expression ca{6,7}t will match strings like caaaaaat (6 a characters) and caaaaaaat (7 a characters). It will not match strings with fewer than 6 or more than 7 a characters between c and t.

It can also be written as ca{n}t with exact match “n” repetitions of the preceding symbol.

Further it can be implemented like ca{n,}t with exact match “n” or more occurrences of the character greater than “n” repetitions of the preceding symbol.

2.2 Group

Meta Characters	Descriptions
(xyz)	Character group. Matches the characters xyz in that exact order.

E.g., Regular expression (ca){2}t is used to match strings that have the pattern ca repeated exactly two times consecutively.

2.3 Alternation (OR Operator)

Meta Characters	Descriptions
\|	Alternation. Matches either the characters before or the characters after the symbol.

E.g., The regular expression a (cat|dog) is used to match strings that contain the letter a followed by a space and then either the word cat or the word dog.

2.4 Character Classes

Character classes are also called Character Sets. Square brackets [] are used to specify character sets. Use a hyphen inside a character set to specify the characters’ range.

The order of the character range inside the square brackets doesn’t matter.

E.g., [Tt]he means: an uppercase T or lowercase t, followed by the letter h, followed by the letter e.

The following [a-z] matches a single character in the range between a and z in lower case.

2.4.1 Neglated Character Classes

In general, the caret symbol ^ represents the start of the string, but when it is typed after the opening square bracket it negates the character class.

E.g., the expression [^c]ar means any character except c followed by ar.In other words, it matches any word that contains with ar but doesn’t start with the letter c.

3. Shorthand Character Sets

Shorthand	Description
.	Any character except new line
\w	Matches alphanumeric characters: `[a-zA-Z0-9_]`
\W	Matches non-alphanumeric characters: `[^\w]`
\d	Matches digits: `[0-9]`
\D	Matches non-digits: `[^\d]`
\s	Matches whitespace characters: `[\t\n\f\r\p{Z}]`
\S	Matches non-whitespace characters: `[^\s]`