1. Overview
Regular expressions are useful in text processing fields to extract information.
The main idea: writting patterns to match a specific sequence of characters
2. Quick Start
- Letters
a matches a
ab matches ab
[abc] matches only a / b / c (1 character)
[^abc] matches only 1 character except a\b\c
[a-z] matches only 1 character from a to z
- Digits
123 matches 123
\d matches any digit
\D matches any Non-digit
- Wild Card
. matches any character
\. matches .
\w matches any Alphanumeric (alphabet + number) character
- equals to
[A-Za-z0-9_]
\W matches any non-alphanumeric character
- Repetitions
a{3} matches aaa
a{1,3} matches a / aa / aaa
a* matches 0 or more repetition of a
a+ matches 1 or more repetition of a
a? a is optional in this case, so matches 0 / 1 repetition of a
- Whitespace
There are many common forms of whitespace
- space
- tab
- new line
- carriage return
These can be matched by \s, so \s is extremely useful when dealing with raw input text
^...$
Defines what should be matched in a line‘s begining and end
- Group
Use ( ) to extract information for further processing
e.g. ^(IMG\d+)\.png$ will match .png file but will only capture files’ name
- Nested Group
Use nested ( ) to extract multiple layes of information