Regular Expressions

From miki
Revision as of 10:23, 13 January 2014 by Mip (talk | contribs) (→‎Regex Golf)
Jump to navigation Jump to search

References

Engines

Powerful engines:

Open source regex engine implemented into PHP for instance

Less powerful engines:

Use Extended regular expressions (switch -r) so that meta-characters (){} have their special meaning when unquoted.

Character Classes

Class Meaning Comment
[ae] Matches a or e
[a-z] Matches any char in range a...z
[^a-z] Matches any char not in range a...z
\d Digit - Equivalent to [0-9]
\w Word character - Equivalent to [A-Za-z0-9_]
\s Whitespace character - Equivalent to [ \t\r\n]
\D Negated \d, i.e. [^\d]
\W Negated \w, i.e. [^\w]
\S Negated \s, i.e. [^\s]

About negated class, note that [\D\S] is not the same as [^\d\s]. The latter will not match a character that is either a digit or a whitespace. The former will match any character that is either not a digit, or not a whitespace, i.e. it will match any character...

Zero-length matches

The regex here are zero-length, meaning they match a zero-length string, either because they match particular positions in the string (such as end-of-line or beginning-of-line anchors), or because the matched string is dropped after evaluation (like assertions, which only yield a boolean value, match or not matched).

Anchors:

Anchor Meaning Comment
^ Beginning-of-line anchor
$ End-of-line anchor
\b Word boundary anchor
\B Negated word boundary anchor
\< Start-of-a-word anchor GNU extensions
\> End-of-a-word anchor GNU extensions

Assertions:

Assertion Meaning Comment
(?=regex) Lookahead positive assertion e.g. \b(?=\w{0,3}cat)\w{6}\b, matches locate
(?!regex) Lookahead negative assertion e.g. \b(?!\w{0,3}cat)\w{6}\b, matches relica but not locate
(?<=regex) Lookbehind positive assertion
(?<!regex) Lookbehind negative assertion

Negative assertion can be used to invert a regex match: ^(?!.*<REGEX_HERE>) will match everything not matching <REGEX_HERE>

Examples

Sed - The list below is actually for Extended regular expression (switch -r).

Regexp Description
. Match any character
gray|grey Match gray or grey
gr(a|e)y Match gray or grey
gr[ae]y Match gray or grey
file[^0-2] Match file3 or file4, but not file0, file1, file2.
colou?r (zero or one) - Match Color or Colour.
ab*c (zero or more) - Match ac, abc, abbc, ....
ab+c (one or more) - Match abc, abbc, abbbc, ....
a{3,5} (at least m and not more than n times) - Match aaa, aaaa, aaaaa.
^on single line$ (start and end of line) - Match on single line on a single line.

Regex Golf

My solutions so far (see here for other scores [1]):

Plain strings (207)   foo
Anchors (208)         k$
Ranges (202)          ^[a-f]+$
Backrefs (201)        (...).*\1
Abba (190)            ^(?!.*(.)(.)\2\1).*$
Abba (193)            ^(?!.*(.)(.)\2\1)
A man, a plan (176)   ^(.)(.).*\2\1$
Prime (232)           ^(xx|xxx|x{5}|x{7}|x{11}|x{13}|x{17}|x{19}|x{23}|x{29}|x{31})$|x{33}
Four (198)            (.).\1.\1.\1
Order (156)           ^a?b?c?c?d?e?e?f?g?h?i?l?l?m?n?o?o?p?r?s?s?t?t?y?w?z?$
Triples (570)         (6|[56]0|31|12|24|[48]7|58|0[0249]|7[258]|003|015|303|9005)$
Glob                  NONE
Balance (283)         ^(<(<(<(<(<(<(<>)*>)*>)*>)*>)*>)*>)*$
Balance (286)         ^<<>><|^(<(<(<(<(<>)*>)*>)*>)*>)*$
Powers (60)           ^(((((((((xx?)\9?)\8?)\7?)\6?)\5?)\4?)\3?)\2?)\1?$
Powers (72)           ^(?!((xx)+x|(x{24})+|x{28}|x{160})$)x*
Powers (76)           ^(?!((xx)+x|x{28}|x{48}|(x{5})+)$)
Long count (216)      ^0+ 0+1 0010 0011 0100 0101 0110 01+ 10+ 1001 1010 101
Long count v2 (216)   ^0+ 0+1 0010 0011 0100 0101 0110 01+ 10+ 1001 1010 101
Alphabetical (289)    [rs]er$|^([er]|ass).*s$|a t|e e|n r|rt r|ne t|ar ta

Total 3426