Regular Expressions

From miki
Revision as of 15:41, 25 November 2009 by Mip (talk | contribs) (Created page with '== References == * [http://www.regular-expressions.info/ Regular-Expressions.info], The Premier website about Regular Expressions * [http://en.wikipedia.org/wiki/Regular_expressi…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

References

Engines

Powerful engines:

Open source regex engine implemented into PHP for instance

Less powerful engines:

Use Extended regular expressions (switch -r) so that meta-characters (){} have their special meaning when unquoted.

Character Classes

Class Meaning Comment
[ae] Matches a or e
[a-z] Matches any char in range a...z
[^a-z] Matches any char not in range a...z
\d Digit - Equivalent to [0-9]
\w Word character - Equivalent to [A-Za-z0-9_]
\s Whitespace character - Equivalent to [ \t\r\n]
\D Negated \d, i.e. [^\d]
\W Negated \w, i.e. [^\w]
\S Negated \s, i.e. [^\s]

About negated class, note that [\D\S] is not the same as [^\d\s]. The latter will not match a character that is either a digit or a whitespace. The former will match any character that is either not a digit, or not a whitespace, i.e. it will match any character...

Zero-length matches

The regex here are zero-length, meaning they match a zero-length string, either because they match particular positions in the string (such as end-of-line or beginning-of-line anchors), or because the matched string is dropped after evaluation (like assertions, which only yield a boolean value, match or not matched).

Anchors:

Anchor Meaning Comment
^ Beginning-of-line anchor
$ End-of-line anchor
\b Word boundary anchor
\B Negated word boundary anchor
\< Start-of-a-word anchor GNU extensions
\> End-of-a-word anchor GNU extensions

Assertions:

Assertion Meaning Comment
(?=regex) Lookahead positive assertion e.g. \b(?=\w{0,3}cat)\w{6}\b, matches locate
(?!regex) Lookahead negative assertion e.g. \b(?!\w{0,3}cat)\w{6}\b, matches relica but not locate
(?<=regex) Lookbehind positive assertion
(?<!regex) Lookbehind negative assertion

Examples

Sed - The list below is actually for Extended regular expression (switch -r).

Regexp Description
. Match any character
gray|grey Match gray or grey
gr(a|e)y Match gray or grey
gr[ae]y Match gray or grey
file[^0-2] Match file3 or file4, but not file0, file1, file2.
colou?r (zero or one) - Match Color or Colour.
ab*c (zero or more) - Match ac, abc, abbc, ....
ab+c (one or more) - Match abc, abbc, abbbc, ....
a{3,5} (at least m and not more than n times) - Match aaa, aaaa, aaaaa.
^on single line$ (start and end of line) - Match on single line on a single line.