48re Module
48.1Overview
The re
module provides measures to operate strings with a regular expression. To utilize it, import the re
module using import
function.
This module provides three different forms of function that has the same feature as below:
- Module function
- Method of
re.pattern
class - Method of
string
class
For example, a feature to match a string with a regular expression can be described as below:
Using a module function:
m = re.match('gur[ai]', str)
Using a method of re.pattern
class:
m = re.pattern('gur[ai]').match(str)
Using a method of string
class:
m = str.match('gur[ai]')
The table below shows the features related to regular-expression and functions that provides them.
Feature | Module Function | Method of re.pattern | Method of string |
---|---|---|---|
Match |
re.match() |
re.pattern#match() |
string#match() |
Subtraction |
re.sub() |
re.pattern#sub() |
string#sub() |
Split |
re.split() |
re.pattern#split() |
string#splitsub() |
Scan |
re.scan() |
re.pattern#scan() |
string#scan() |
48.2Regular Expression
You can describe a matching pattern using a syntax based on POSIX Extended Regular Expression.
The syntax uses a back slash character to avoid some characters such as "(
" and ")
" from being recognized as a meta character. Since a back slash is used as an escaping character in Gura string as well, you have to write two back slashes to represent a single back slash in a regular expression. For example, an expression "sin\(x\)
" that matches a string "sin(x)
" is described as below:
m = str.match('sin\\(x\\)')
Using a raw string appended with a prefix "r
", in which a back slash is parsed as a regular character, could avoid such complications.
m = str.match(r'sin\(x\)')
48.3re.match Class
An instance of re.match
class is used as a result value of re.match()
, re.pattern#match()
and string#match()
to provide matching information.
48.3.1Property
Property | Type | R/W | Explanation |
---|---|---|---|
source |
string |
R | String that has been matched. |
string |
string |
R | String of the matched part. |
begin |
number |
R | Beginning position of the matched part. |
end |
number |
R | Ending position of the matched part. |
48.3.2Index Access
A re.match
instance can be indexed with a number
or string
value.
The value of number
indicates the group index number that starts from zero. The group indexed by zero is special and represents the whole region of the match. The groups indexed by numbers greater than zero correspond to matching patterns of grouping.
Below is an example:
str = '12:34:56'\n"
m = str.match(r'(\d\d):(\d\d):(\d\d)')\n"
m[0] // returns the whole region of matching: 12:34:56\n"
m[1] // returns the 1st group: 12\n"
m[2] // returns the 2nd group: 34\n"
m[3] // returns the 3rd group: 56\n"
The value of string
is used to point out a named capturing group that is described as "(?<name>group)
" in a regular expression.
Below is an example:
str = '12:34:56'\n"
m = str.match(r'(?<hour>\d\d):(?<min>\d\d):(?<sec>\d\d)')\n"
m['hour'] // returns the group named 'hour': 12\n"
m['min'] // returns the group named 'min': 34\n"
m['sec'] // returns the group named 'sec': 56\n");
48.3.3Method
re.match#group(index):map
Returns a re.group
instance that is positioned by the specified index.
The argument index
is a value of number
or string
.
The value of number
indicates the group index number that starts from zero. The group indexed by zero is special and represents the whole region of the match. The groups indexed by numbers greater than zero correspond to matching patterns of grouping. Below is an example:
str = '12:34:56'
m = str.match(r'(\d\d):(\d\d):(\d\d)')
m.group(0).string // returns the whole region of matching: 12:34:56
m.group(1).string // returns the 1st group: 12
m.group(2).string // returns the 2nd group: 34
m.group(3).string // returns the 3rd group: 56
The value of string
is used to point out a named capturing group that is described in a regular expression as "(?<name>group)
".
Below is an example:
str = '12:34:56'
m = str.match(r'(?<hour>\d\d):(?<min>\d\d):(?<sec>\d\d)')
m.group('hour').string // returns the group named 'hour': 12
m.group('min').string // returns the group named 'min': 34
m.group('sec').string // returns the group named 'sec': 56
re.match#groups() {block?}
Creates an iterator that returns re.group
instances.
In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:
:iter
.. An iterator. This is the default behavior.:xiter
.. An iterator that eliminatesnil
from its elements.:list
.. A list.:xlist
.. A list that eliminatesnil
from its elements.:set
.. A list that eliminates duplicated values from its elements.:xset
.. A list that eliminates duplicated values andnil
from its elements.
See the chapter of Mapping Process in Gura Language Manual for the detail.
If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number|
where value
is the iterated value and idx
the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.
48.4re.group Class
The re.group
instance provides information of capturing groups that are stored in re.match
instance.
48.4.1Property
Property | Type | R/W | Explanation |
---|---|---|---|
string |
string |
R | String of the group. |
begin |
number |
R | Beginning position of the group. |
end |
number |
R | Ending position of the group. |
48.5re.pattern Class
The re.pattern
class is used to describe a pattern of regular expression.
48.5.1Cast Operation
A function that expects a re.pattern
instance in its argument can also take a value of string
below:
string
.. Recognized as a regular expression from whichre.pattern
instance is created.
Using the above casting feature, you can call a function f(pattern:re.pattern)
that expects a re.pattern
instance in its argument as below:
f(re.pattern('gur[ai]'))
.. The most explicit way.f('gur[ai]')
.. Implicit casting: fromstring
tore.pattern
.
48.5.2Constructor
In many cases, re.pattern
instance may be implicitly created by cast operation when a string
is passed to a function's argument that expects re.pattern
type. If you want to customize the pattern's behaviour, such as indicating it to ignore alphabet cases, you can explicitly create the instance with the constructor described below.
re.pattern(pattern:string):map:[icase,multiline] {block?}
Creates a re.pattern
instance from the given pattern string.
Following attributes would customize some traits of the pattern:
:icase
.. Ignores character cases.:multiline
.. Matches ".
" with a line break.
If block
is specified, it would be evaluated with a block parameter |pat:re.pattern|
, where pat
is the created instance. In this case, the block's result would become the function's returned value.
48.5.3Method
re.pattern#match(str:string, pos:number => 0, endpos?:number):map {block?}
Applies a pattern matching to the given string and returns a re.match
instance if the matching successes. If not, it would return nil
.
The argument pos
specifies the starting position for matching process. If omitted, it starts from the beginning of the string.
The argument endpos
specifies the ending position for matching process. If omitted, it would be processed until the end of the string.
If block
is specified, it would be evaluated with a block parameter |m:re.match|
, where m
is the created instance. In this case, the block's result would become the function's returned value.
re.pattern#sub(replace, str:string, count?:number):map {block?}
Substitutes strings that matches pattern
with the specified replacer.
The argument replace
takes a string
or function
.
If a string
is specified, it would be used as a substituting string, in which you can use macros \0
, \1
, \2
.. to refer to matched groups.
If a function
is specified, it would be called with an argument m:re.match
and is expected to return a string for subsitution.
The argument count
specifies the maximum number of substitutions. If omitted, no limit would be applied.
If block
is specified, it would be evaluated with a block parameter |str:string|
, where str
is the created instance. In this case, the block's result would become the function's returned value.
re.pattern#split(str:string, count?:number):map {block?}
Creates an iterator that splits the source string with the specified pattern.
The argument count
specifies the maximum number for splitting. If omitted, no limit would be applied.
In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:
:iter
.. An iterator. This is the default behavior.:xiter
.. An iterator that eliminatesnil
from its elements.:list
.. A list.:xlist
.. A list that eliminatesnil
from its elements.:set
.. A list that eliminates duplicated values from its elements.:xset
.. A list that eliminates duplicated values andnil
from its elements.
See the chapter of Mapping Process in Gura Language Manual for the detail.
If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number|
where value
is the iterated value and idx
the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.
re.pattern#scan(str:string, pos:number => 0, endpos?:number):map {block?}
Creates an iterator that returns strings that match the specified pattern.
The argument pos
specifies the starting position for matching process. If omitted, it starts from the beginning of the string.
The argument endpos
specifies the ending position for matching process. If omitted, it would be processed until the end of the string.
In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:
:iter
.. An iterator. This is the default behavior.:xiter
.. An iterator that eliminatesnil
from its elements.:list
.. A list.:xlist
.. A list that eliminatesnil
from its elements.:set
.. A list that eliminates duplicated values from its elements.:xset
.. A list that eliminates duplicated values andnil
from its elements.
See the chapter of Mapping Process in Gura Language Manual for the detail.
If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number|
where value
is the iterated value and idx
the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.
48.6Extension to string Class
This module extends the string
class with methods described here.
string#match(pattern:re.pattern, pos:number => 0, endpos?:number):map {block?}
Applies a pattern matching to the given string and returns a re.match
instance if the matching successes. If not, it would return nil
.
The argument pos
specifies the starting position for matching process. If omitted, it starts from the beginning of the string.
The argument endpos
specifies the ending position for matching process. If omitted, it would be processed until the end of the string.
If block
is specified, it would be evaluated with a block parameter |m:re.match|
, where m
is the created instance. In this case, the block's result would become the function's returned value.
string#sub(pattern:re.pattern, replace, count?:number):map {block?}
Substitutes strings that matches pattern
with the specified replacer.
The argument replace
takes a string
or function
.
If a string
is specified, it would be used as a substituting string, in which you can use macros \0
, \1
, \2
.. to refer to matched groups.
If a function
is specified, it would be called with an argument m:re.match
and is expected to return a string for subsitution.
The argument count
specifies the maximum number of substitutions. If omitted, no limit would be applied.
If block
is specified, it would be evaluated with a block parameter |str:string|
, where str
is the created instance. In this case, the block's result would become the function's returned value.
string#splitreg(pattern:re.pattern, count?:number):map {block?}
Creates an iterator that splits the source string with the specified pattern.
The argument count
specifies the maximum number for splitting. If omitted, no limit would be applied.
In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:
:iter
.. An iterator. This is the default behavior.:xiter
.. An iterator that eliminatesnil
from its elements.:list
.. A list.:xlist
.. A list that eliminatesnil
from its elements.:set
.. A list that eliminates duplicated values from its elements.:xset
.. A list that eliminates duplicated values andnil
from its elements.
See the chapter of Mapping Process in Gura Language Manual for the detail.
If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number|
where value
is the iterated value and idx
the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.
string#scan(pattern:re.pattern, pos:number => 0, endpos?:number):map {block?}
Creates an iterator that returns strings that match the specified pattern.
The argument pos
specifies the starting position for matching process. If omitted, it starts from the beginning of the string.
The argument endpos
specifies the ending position for matching process. If omitted, it would be processed until the end of the string.
In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:
:iter
.. An iterator. This is the default behavior.:xiter
.. An iterator that eliminatesnil
from its elements.:list
.. A list.:xlist
.. A list that eliminatesnil
from its elements.:set
.. A list that eliminates duplicated values from its elements.:xset
.. A list that eliminates duplicated values andnil
from its elements.
See the chapter of Mapping Process in Gura Language Manual for the detail.
If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number|
where value
is the iterated value and idx
the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.
48.7Extension to iterable Classes
This module extends the iterable classes, list
and iterator
, with methods described here.
iterable#grep(pattern:re.pattern):map {block?}
48.8Module Function
re.match(pattern:re.pattern, str:string, pos:number => 0, endpos?:number):map {block?}
Applies a pattern matching to the given string and returns a re.match
instance if the matching successes. If not, it would return nil
.
The argument pos
specifies the starting position for matching process. If omitted, it starts from the beginning of the string.
The argument endpos
specifies the ending position for matching process. If omitted, it would be processed until the end of the string.
If block
is specified, it would be evaluated with a block parameter |m:re.match|
, where m
is the created instance. In this case, the block's result would become the function's returned value.
re.sub(pattern:re.pattern, replace, str:string, count?:number):map {block?}
Substitutes strings that matches pattern
with the specified replacer.
The argument replace
takes a string
or function
.
If a string
is specified, it would be used as a substituting string, in which you can use macros \0
, \1
, \2
.. to refer to matched groups.
If a function
is specified, it would be called with an argument m:re.match
and is expected to return a string for subsitution.
The argument count
specifies the maximum number of substitutions. If omitted, no limit would be applied.
If block
is specified, it would be evaluated with a block parameter |str:string|
, where str
is the created instance. In this case, the block's result would become the function's returned value.
re.split(pattern:re.pattern, str:string, count?:number):map {block?}
Creates an iterator that splits the source string with the specified pattern.
The argument count
specifies the maximum number for splitting. If omitted, no limit would be applied.
In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:
:iter
.. An iterator. This is the default behavior.:xiter
.. An iterator that eliminatesnil
from its elements.:list
.. A list.:xlist
.. A list that eliminatesnil
from its elements.:set
.. A list that eliminates duplicated values from its elements.:xset
.. A list that eliminates duplicated values andnil
from its elements.
See the chapter of Mapping Process in Gura Language Manual for the detail.
If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number|
where value
is the iterated value and idx
the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.
re.scan(pattern:re.pattern, str:string, pos:number => 0, endpos?:number):map {block?}
Creates an iterator that returns strings that match the specified pattern.
The argument pos
specifies the starting position for matching process. If omitted, it starts from the beginning of the string.
The argument endpos
specifies the ending position for matching process. If omitted, it would be processed until the end of the string.
In default, this returns an iterator as its result value. Specifying the following attributes would customize the returned value:
:iter
.. An iterator. This is the default behavior.:xiter
.. An iterator that eliminatesnil
from its elements.:list
.. A list.:xlist
.. A list that eliminatesnil
from its elements.:set
.. A list that eliminates duplicated values from its elements.:xset
.. A list that eliminates duplicated values andnil
from its elements.
See the chapter of Mapping Process in Gura Language Manual for the detail.
If a block is specified, it would be evaluated repeatingly with block parameters |value, idx:number|
where value
is the iterated value and idx
the loop index starting from zero. In this case, the last evaluated value of the block would be the result value. If one of the attributes listed above is specified, an iterator or a list of the evaluated value would be returned.
48.9Thanks
This module uses Oniguruma library which is distributed in the following site: