48re Module

48.1Overview

The re module provides measures to operate strings with a regular expression. To utilize it, import the re module using import function.

This module provides three different forms of function that has the same feature as below:

  • Module function
  • Method of re.pattern class
  • Method of string class

For example, a feature to match a string with a regular expression can be described as below:

Using a module function:

m = re.match('gur[ai]', str)

Using a method of re.pattern class:

m = re.pattern('gur[ai]').match(str)

Using a method of string class:

m = str.match('gur[ai]')

The table below shows the features related to regular-expression and functions that provides them.

Feature Module Function Method of re.pattern Method of string
Match re.match() re.pattern#match() string#match()
Subtraction re.sub() re.pattern#sub() string#sub()
Split re.split() re.pattern#split() string#splitsub()
Scan re.scan() re.pattern#scan() string#scan()

48.2Regular Expression

You can describe a matching pattern using a syntax based on POSIX Extended Regular Expression.

The syntax uses a back slash character to avoid some characters such as "(" and ")" from being recognized as a meta character. Since a back slash is used as an escaping character in Gura string as well, you have to write two back slashes to represent a single back slash in a regular expression. For example, an expression "sin\(x\)" that matches a string "sin(x)" is described as below:

m = str.match('sin\\(x\\)')

Using a raw string appended with a prefix "r", in which a back slash is parsed as a regular character, could avoid such complications.

m = str.match(r'sin\(x\)')

48.3re.match Class

An instance of re.match class is used as a result value of re.match(), re.pattern#match() and string#match() to provide matching information.

48.3.1Property

Property Type R/W Explanation
source string R String that has been matched.
string string R String of the matched part.
begin number R Beginning position of the matched part.
end number R Ending position of the matched part.

48.3.2Index Access

A re.match instance can be indexed with a number or string value.

The value of number indicates the group index number that starts from zero. The group indexed by zero is special and represents the whole region of the match. The groups indexed by numbers greater than zero correspond to matching patterns of grouping.

Below is an example:

str = '12:34:56'\n"
m = str.match(r'(\d\d):(\d\d):(\d\d)')\n"
m[0]  // returns the whole region of matching: 12:34:56\n"
m[1]  // returns the 1st group: 12\n"
m[2]  // returns the 2nd group: 34\n"
m[3]  // returns the 3rd group: 56\n"

The value of string is used to point out a named capturing group that is described as "(?<name>group)" in a regular expression.

Below is an example:

str = '12:34:56'\n"
m = str.match(r'(?<hour>\d\d):(?<min>\d\d):(?<sec>\d\d)')\n"
m['hour']  // returns the group named 'hour': 12\n"
m['min']   // returns the group named 'min': 34\n"
m['sec']   // returns the group named 'sec': 56\n");

48.3.3Method

re.match#group(index):map

Returns a re.group instance that is positioned by the specified index.

The argument index is a value of number or string.

The value of number indicates the group index number that starts from zero. The group indexed by zero is special and represents the whole region of the match. The groups indexed by numbers greater than zero correspond to matching patterns of grouping. Below is an example:

str = '12:34:56'
m = str.match(r'(\d\d):(\d\d):(\d\d)')
m.group(0).string // returns the whole region of matching: 12:34:56
m.group(1).string // returns the 1st group: 12
m.group(2).string // returns the 2nd group: 34
m.group(3).string // returns the 3rd group: 56

The value of string is used to point out a named capturing group that is described in a regular expression as "(?<name>group)".

Below is an example:

str = '12:34:56'
m = str.match(r'(?<hour>\d\d):(?<min>\d\d):(?<sec>\d\d)')
m.group('hour').string // returns the group named 'hour': 12
m.group('min').string  // returns the group named 'min': 34
m.group('sec').string  // returns the group named 'sec': 56
re.match#groups() {block?}

Creates an iterator that returns re.group instances.

In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:

  • :iter .. An iterator. This is the default behavior.
  • :xiter .. An iterator that eliminates nil from its elements.
  • :list .. A list.
  • :xlist .. A list that eliminates nil from its elements.
  • :set .. A list that eliminates duplicated values from its elements.
  • :xset .. A list that eliminates duplicated values and nil from its elements.

See the chapter of Mapping Process in Gura Language Manual for the detail.

If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number| where value is the iterated value and idx the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.

48.4re.group Class

The re.group instance provides information of capturing groups that are stored in re.match instance.

48.4.1Property

Property Type R/W Explanation
string string R String of the group.
begin number R Beginning position of the group.
end number R Ending position of the group.

48.5re.pattern Class

The re.pattern class is used to describe a pattern of regular expression.

48.5.1Cast Operation

A function that expects a re.pattern instance in its argument can also take a value of string below:

  • string .. Recognized as a regular expression from which re.pattern instance is created.

Using the above casting feature, you can call a function f(pattern:re.pattern) that expects a re.pattern instance in its argument as below:

  • f(re.pattern('gur[ai]')) .. The most explicit way.
  • f('gur[ai]') .. Implicit casting: from string to re.pattern.

48.5.2Constructor

In many cases, re.pattern instance may be implicitly created by cast operation when a string is passed to a function's argument that expects re.pattern type. If you want to customize the pattern's behaviour, such as indicating it to ignore alphabet cases, you can explicitly create the instance with the constructor described below.

re.pattern(pattern:string):map:[icase,multiline] {block?}

Creates a re.pattern instance from the given pattern string.

Following attributes would customize some traits of the pattern:

  • :icase .. Ignores character cases.
  • :multiline .. Matches "." with a line break.

If block is specified, it would be evaluated with a block parameter |pat:re.pattern|, where pat is the created instance. In this case, the block's result would become the function's returned value.

48.5.3Method

re.pattern#match(str:string, pos:number => 0, endpos?:number):map {block?}

Applies a pattern matching to the given string and returns a re.match instance if the matching successes. If not, it would return nil.

The argument pos specifies the starting position for matching process. If omitted, it starts from the beginning of the string.

The argument endpos specifies the ending position for matching process. If omitted, it would be processed until the end of the string.

If block is specified, it would be evaluated with a block parameter |m:re.match|, where m is the created instance. In this case, the block's result would become the function's returned value.

re.pattern#sub(replace, str:string, count?:number):map {block?}

Substitutes strings that matches pattern with the specified replacer.

The argument replace takes a string or function.

If a string is specified, it would be used as a substituting string, in which you can use macros \0, \1, \2 .. to refer to matched groups.

If a function is specified, it would be called with an argument m:re.match and is expected to return a string for subsitution.

The argument count specifies the maximum number of substitutions. If omitted, no limit would be applied.

If block is specified, it would be evaluated with a block parameter |str:string|, where str is the created instance. In this case, the block's result would become the function's returned value.

re.pattern#split(str:string, count?:number):map {block?}

Creates an iterator that splits the source string with the specified pattern.

The argument count specifies the maximum number for splitting. If omitted, no limit would be applied.

In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:

  • :iter .. An iterator. This is the default behavior.
  • :xiter .. An iterator that eliminates nil from its elements.
  • :list .. A list.
  • :xlist .. A list that eliminates nil from its elements.
  • :set .. A list that eliminates duplicated values from its elements.
  • :xset .. A list that eliminates duplicated values and nil from its elements.

See the chapter of Mapping Process in Gura Language Manual for the detail.

If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number| where value is the iterated value and idx the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.

re.pattern#scan(str:string, pos:number => 0, endpos?:number):map {block?}

Creates an iterator that returns strings that match the specified pattern.

The argument pos specifies the starting position for matching process. If omitted, it starts from the beginning of the string.

The argument endpos specifies the ending position for matching process. If omitted, it would be processed until the end of the string.

In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:

  • :iter .. An iterator. This is the default behavior.
  • :xiter .. An iterator that eliminates nil from its elements.
  • :list .. A list.
  • :xlist .. A list that eliminates nil from its elements.
  • :set .. A list that eliminates duplicated values from its elements.
  • :xset .. A list that eliminates duplicated values and nil from its elements.

See the chapter of Mapping Process in Gura Language Manual for the detail.

If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number| where value is the iterated value and idx the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.

48.6Extension to string Class

This module extends the string class with methods described here.

string#match(pattern:re.pattern, pos:number => 0, endpos?:number):map {block?}

Applies a pattern matching to the given string and returns a re.match instance if the matching successes. If not, it would return nil.

The argument pos specifies the starting position for matching process. If omitted, it starts from the beginning of the string.

The argument endpos specifies the ending position for matching process. If omitted, it would be processed until the end of the string.

If block is specified, it would be evaluated with a block parameter |m:re.match|, where m is the created instance. In this case, the block's result would become the function's returned value.

string#sub(pattern:re.pattern, replace, count?:number):map {block?}

Substitutes strings that matches pattern with the specified replacer.

The argument replace takes a string or function.

If a string is specified, it would be used as a substituting string, in which you can use macros \0, \1, \2 .. to refer to matched groups.

If a function is specified, it would be called with an argument m:re.match and is expected to return a string for subsitution.

The argument count specifies the maximum number of substitutions. If omitted, no limit would be applied.

If block is specified, it would be evaluated with a block parameter |str:string|, where str is the created instance. In this case, the block's result would become the function's returned value.

string#splitreg(pattern:re.pattern, count?:number):map {block?}

Creates an iterator that splits the source string with the specified pattern.

The argument count specifies the maximum number for splitting. If omitted, no limit would be applied.

In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:

  • :iter .. An iterator. This is the default behavior.
  • :xiter .. An iterator that eliminates nil from its elements.
  • :list .. A list.
  • :xlist .. A list that eliminates nil from its elements.
  • :set .. A list that eliminates duplicated values from its elements.
  • :xset .. A list that eliminates duplicated values and nil from its elements.

See the chapter of Mapping Process in Gura Language Manual for the detail.

If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number| where value is the iterated value and idx the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.

string#scan(pattern:re.pattern, pos:number => 0, endpos?:number):map {block?}

Creates an iterator that returns strings that match the specified pattern.

The argument pos specifies the starting position for matching process. If omitted, it starts from the beginning of the string.

The argument endpos specifies the ending position for matching process. If omitted, it would be processed until the end of the string.

In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:

  • :iter .. An iterator. This is the default behavior.
  • :xiter .. An iterator that eliminates nil from its elements.
  • :list .. A list.
  • :xlist .. A list that eliminates nil from its elements.
  • :set .. A list that eliminates duplicated values from its elements.
  • :xset .. A list that eliminates duplicated values and nil from its elements.

See the chapter of Mapping Process in Gura Language Manual for the detail.

If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number| where value is the iterated value and idx the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.

48.7Extension to iterable Classes

This module extends the iterable classes, list and iterator, with methods described here.

iterable#grep(pattern:re.pattern):map {block?}

48.8Module Function

re.match(pattern:re.pattern, str:string, pos:number => 0, endpos?:number):map {block?}

Applies a pattern matching to the given string and returns a re.match instance if the matching successes. If not, it would return nil.

The argument pos specifies the starting position for matching process. If omitted, it starts from the beginning of the string.

The argument endpos specifies the ending position for matching process. If omitted, it would be processed until the end of the string.

If block is specified, it would be evaluated with a block parameter |m:re.match|, where m is the created instance. In this case, the block's result would become the function's returned value.

re.sub(pattern:re.pattern, replace, str:string, count?:number):map {block?}

Substitutes strings that matches pattern with the specified replacer.

The argument replace takes a string or function.

If a string is specified, it would be used as a substituting string, in which you can use macros \0, \1, \2 .. to refer to matched groups.

If a function is specified, it would be called with an argument m:re.match and is expected to return a string for subsitution.

The argument count specifies the maximum number of substitutions. If omitted, no limit would be applied.

If block is specified, it would be evaluated with a block parameter |str:string|, where str is the created instance. In this case, the block's result would become the function's returned value.

re.split(pattern:re.pattern, str:string, count?:number):map {block?}

Creates an iterator that splits the source string with the specified pattern.

The argument count specifies the maximum number for splitting. If omitted, no limit would be applied.

In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:

  • :iter .. An iterator. This is the default behavior.
  • :xiter .. An iterator that eliminates nil from its elements.
  • :list .. A list.
  • :xlist .. A list that eliminates nil from its elements.
  • :set .. A list that eliminates duplicated values from its elements.
  • :xset .. A list that eliminates duplicated values and nil from its elements.

See the chapter of Mapping Process in Gura Language Manual for the detail.

If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number| where value is the iterated value and idx the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.

re.scan(pattern:re.pattern, str:string, pos:number => 0, endpos?:number):map {block?}

Creates an iterator that returns strings that match the specified pattern.

The argument pos specifies the starting position for matching process. If omitted, it starts from the beginning of the string.

The argument endpos specifies the ending position for matching process. If omitted, it would be processed until the end of the string.

In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:

  • :iter .. An iterator. This is the default behavior.
  • :xiter .. An iterator that eliminates nil from its elements.
  • :list .. A list.
  • :xlist .. A list that eliminates nil from its elements.
  • :set .. A list that eliminates duplicated values from its elements.
  • :xset .. A list that eliminates duplicated values and nil from its elements.

See the chapter of Mapping Process in Gura Language Manual for the detail.

If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number| where value is the iterated value and idx the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.

48.9Thanks

This module uses Oniguruma library which is distributed in the following site:

http://www.geocities.jp/kosako3/oniguruma/index.html