regex -- evaluate a regular expression search

Synopsis

Usage:

regex(re, str)

regex(re, start, str)

regex(re, start, range, str)
Inputs:
- re, a string, a regular expression describing a pattern
- start, an integer, positive, the position in str at which to begin the search. when omitted, the search starts at the beginning of the string.
- range, an integer, restricts matches to those beginning at a position between start and start + range; when 0, the pattern is matched only at the starting position; when negative, only positions to the left of the starting position are examined for matches; when omitted, the search extends to the end of the string.
- str, a string, the subject string to be searched
Optional inputs:
- POSIX => a Boolean value, default value false, if true, interpret the re using the POSIX Extended flavor, otherwise the Perl flavor
Outputs:
- a list, a list of pairs of integers; each pair denotes the beginning position and the length of a substring. Only the leftmost matching substring of str and the capturing groups within it are returned. If no match is found, the output is null.

Description

The value returned is a list of pairs of integers corresponding to the parenthesized subexpressions successfully matched, suitable for use as the first argument of substring. The first member of each pair is the offset within str of the substring matched, and the second is the length.

See regular expressions for a brief introduction to the topic.

i1 : s = "The cat is black.";

i2 : m = regex("(\\w+) (\\w+) (\\w+)",s)

o2 = {(0, 10), (0, 3), (4, 3), (8, 2)}

o2 : List

i3 : substring(m#0, s)

o3 = The cat is

i4 : substring(m#1, s)

o4 = The

i5 : substring(m#2, s)

o5 = cat

i6 : substring(m#3, s)

o6 = is

i7 : s = "aa     aaaa";

i8 : m = regex("a+", 0, s)

o8 = {(0, 2)}

o8 : List

i9 : substring(m#0, s)

o9 = aa

i10 : m = regex("a+", 2, s)

o10 = {(7, 4)}

o10 : List

i11 : substring(m#0, s)

o11 = aaaa

i12 : m = regex("a+", 2, 3, s)

i13 : s = "line 1\nline 2\r\nline 3";

i14 : m = regex("^.*$", 8, -8, s)

o14 = {(7, 6)}

o14 : List

i15 : substring(m#0, s)

o15 = line 2

i16 : m = regex("^", 10, -10, s)

o16 = {(7, 0)}

o16 : List

i17 : substring(0, m#0#0, s)

o17 = line 1

i18 : substring(m#0#0, s)

o18 = line 2
      line 3

i19 : m = regex("^.*$", 4, -10, s)

o19 = {(0, 6)}

o19 : List

i20 : substring(m#0, s)

o20 = line 1

i21 : m = regex("a.*$", 4, -10, s)

By default, the regular expressions are interpreted using the Perl flavor, which supports features such as lookaheads and lookbehinds for fine-tuning the matches. This syntax is used in Perl and JavaScript languages.

i22 : regex("A(?!C)", "AC AB")

o22 = {(3, 1)}

o22 : List

i23 : regex("A(?=B)", "AC AB")

o23 = {(3, 1)}

o23 : List

Alternatively, one can choose the POSIX Extended flavor of regex using POSIX => true. This syntax is similar to the one used by the Unix utilities egrep and awk and enforces the leftmost, longest rule for finding matches. If there's a tie, the rule is applied to the first subexpression.

i24 : s = "<b>bold</b> and <b>strong</b>";

i25 : m = regex("<b>(.*)</b>", s, POSIX => true);

i26 : substring(m#1, s)

o26 = bold</b> and <b>strong

In the Perl flavor, one can specify whether repetitions should be possessive or non-greedy.

i27 : m = regex("<b>(.*?)</b>", s);

i28 : substring(m#1, s)

o28 = bold

Ways to use regex :

regex(String,String)
regex(String,ZZ,String)
regex(String,ZZ,ZZ,String)

For the programmer

The object regex is a method function with options.

regex -- evaluate a regular expression search

Synopsis

Description

See also

Ways to use regex :

For the programmer