PCRE (Perl Compatible Regular Expressions) is a C library implementing regex. It was written in 1997 when Perl was the de-facto choice for complex text processing tasks. The syntax for patterns used in PCRE closely resembles Perl. PCRE syntax is being used in many big projects called flavors – .NET, Java, JavaScript, XRegExp, Perl, PCRE, Python, and Ruby, and the programming languages C#, Java, JavaScript, Perl, PHP, Python, Ruby, and VB.NET.
PCRE’s syntax is much more powerful and flexible than either of the POSIX regular expression flavors and than that of many other regular-expression libraries.
we’re focused on PRCE mostly unless stated!
Anchors
^
|
Start of string, or start of line in multi-line pattern
|
\A
|
Start of string
|
$
|
End of string, or end of line in multi-line pattern
|
\Z
|
End of string
|
\b
|
Word boundary
|
\B
|
Not word boundary
|
\<
|
Start of word
|
\>
|
End of word
|
Character Classes
\c
|
Control character
|
\s
|
White space
|
\S
|
Not white space
|
\d
|
Digit
|
\D
|
Not digit
|
\w
|
Word
|
\W
|
Not word
|
\x
|
Hexadecimal digit
|
\O
|
Octal digit
|
POSIX
[:upper:]
|
Upper case letters
|
[:lower:]
|
Lower case letters
|
[:alpha:]
|
All letters
|
[:alnum:]
|
Digits and letters
|
[:digit:]
|
Digits
|
[:xdigit:]
|
Hexadecimal digits
|
[:punct:]
|
Punctuation
|
[:blank:]
|
Space and tab
|
[:space:]
|
Blank characters
|
[:cntrl:]
|
Control characters
|
[:graph:]
|
Printed characters
|
[:print:]
|
Printed characters and spaces
|
[:word:]
|
Digits, letters and underscore
|
Assertions
?=
|
Lookahead assertion
|
?!
|
Negative lookahead
|
?<=
|
Lookbehind assertion
|
?!= or ?<!
|
Negative lookbehind
|
?>
|
Once-only Subexpression
|
?()
|
Condition [if then]
|
?()|
|
Condition [if then else]
|
?#
|
Comment
|
Groups and Ranges
.
|
Any character except new line (\n)
|
(a|b)
|
a or b
|
(…)
|
Group
|
(?:…)
|
Passive (non-capturing) group
|
[abc]
|
Range (a or b or c)
|
[^abc]
|
Not (a or b or c)
|
[a-q]
|
Lower case letter from a to q
|
[A-Q]
|
Upper case letter from A to Q
|
[0-7]
|
Digit from 0 to 7
|
\x
|
Group/subpattern number “x”
|
Pattern Modifiers
g
|
Global match
|
i *
|
Case-insensitive
|
m *
|
Multiple lines
|
s *
|
Treat string as single line
|
x *
|
Allow comments and whitespace in pattern
|
e *
|
Evaluate replacement
|
U *
|
Ungreedy pattern
|
String Replacement
$n
|
nth non-passive group
|
$2
|
“xyz” in /^(abc(xyz))$/
|
$1
|
“xyz” in /^(?:abc)(xyz)$/
|
$`
|
Before matched string
|
$’
|
After matched string
|
$+
|
Last matched string
|
$&
|
Entire matched string
|
Some regex implementations use \ instead of $.
Hidden chars or shortcuts
\s = [ \t\n\r\f]
\d = [0-9]
\w = [a-zA-Z_0-9])
|
|
Quantifiers
*
|
0 or more
|
{3}
|
Exactly 3
|
+
|
1 or more
|
{3,}
|
3 or more
|
?
|
0 or 1
|
{3,5}
|
3, 4 or 5
|
Add a ? to a quantifier to make it ungreedy.
Escape Sequences
\
|
Escape following character
|
\Q
|
Begin literal sequence
|
\E
|
End literal sequence
|
“Escaping” is a way of treating characters which have a special meaning in regular expressions literally, rather than as special characters.
Common Metacharacters
^
|
[
|
.
|
$
|
{
|
*
|
(
|
\
|
+
|
)
|
|
|
?
|
<
|
> ]
|
The escape character is usually \
Special Characters
\n
|
New line
|
\r
|
Carriage return
|
\t
|
Tab
|
\v
|
Vertical tab
|
\f
|
Form feed
|
\xxx
|
Octal character xxx
|
\xhh
|
Hex character hh
|
Case Conversion
\l Make next character lowercase
\u Make next character uppercase
\L Make entire string (up to \E) lowercase
\U Make entire string (up to \E) uppercase
\u\L Capitalize first char, lowercase rest (sentence)
|
PCRE regex quick reference
[abx-z] |
One character of: a, b, or the range x-z |
[^abx-z] |
One character except: a, b, or the range x-z |
a|b |
a or b |
a? |
Zero or one a’s (greedy) |
a?? |
Zero or one a’s (lazy) |
a* |
Zero or more a’s (greedy) |
a*? |
Zero or more a’s (lazy) |
a+ |
One or more a’s (greedy) |
a+? |
One or more a’s (lazy) |
a{4} |
Exactly 4 a’s |
a{4,8} |
Between (inclusive) 4 and 8 a’s |
a{9,} |
9 or more a’s |
(?>…) |
An atomic group |
(?=…) |
A positive lookahead |
(?!…) |
A negative lookahead |
(?<=…) |
A positive lookbehind |
(?<!…) |
A negative lookbehind |
(?:…) |
A non-capturing group |
(…) |
A capturing group |
(?P<n>…) |
A capturing group named n |
^ |
Beginning of the string |
$ |
End of the string |
\d |
A digit (same as [0-9]) |
\D |
A non-digit (same as [^0-9]) |
\w |
A word character (same as [_a-zA-Z0-9]) |
\W |
A non-word character (same as [^_a-zA-Z0-9]) |
\s |
A whitespace character |
\S |
A non-whitespace character |
\b |
A word boundary |
\B |
A non-word boundary |
\n |
A newline |
\t |
A tab |
\cY |
The control character with the hex code Y |
\xYY |
The character with the hex code YY |
\uYYYY |
The character with the hex code YYYY |
. |
Any character |
\Y |
The Y’th captured group |
(?1) |
Recurse into numbered group 1 |
(?&x) |
Recurse into named group x |
(?P=n) |
The captured group named ‘n’ |
(?#…) |
A comment |