Regular expression string matching.
Search for pat in str and return the positions and substrings of any matches, or empty values if there are none.
The matched pattern pat can include any of the standard regex operators, including:
.
Match any character
* + ? {}
Repetition operators, representing
*
Match zero or more times
+
Match one or more times
?
Match zero or one times
{n}
Match exactly n times
{n,}
Match n or more times
{m,n}
Match between m and n times
[…] [^…]
List operators. The pattern will match any character listed between
"["
and "]"
. If the first character is "^"
then the
pattern is inverted and any character except those listed between brackets
will match.
Escape sequences defined below can also be used inside list operators. For
example, a template for a floating point number might be [-+.\d]+
.
() (?:)
Grouping operator. The first form, parentheses only, also creates a token.
|
Alternation operator. Match one of a choice of regular expressions. The
alternatives must be delimited by the grouping operator ()
above.
^ $
Anchoring operators. Requires pattern to occur at the start (^
) or
end ($
) of the string.
In addition, the following escaped characters have special meaning.
\d
Match any digit
\D
Match any non-digit
\s
Match any whitespace character
\S
Match any non-whitespace character
\w
Match any word character
\W
Match any non-word character
\<
Match the beginning of a word
\>
Match the end of a word
\B
Match within a word
Implementation Note: For compatibility with MATLAB, escape sequences
in pat (e.g., "\n"
=> newline) are expanded
even when pat has been defined with single quotes. To disable
expansion use a second backslash before the escape sequence (e.g.,
"\\n") or use the regexptranslate
function.
The outputs of regexp
default to the order given below
The start indices of each matching substring
The end indices of each matching substring
The extents of each matched token surrounded by (…)
in
pat
A cell array of the text of each match
A cell array of the text of each token matched
A structure containing the text of each matched named token, with the name
being used as the fieldname. A named token is denoted by
(?<name>…)
.
A cell array of the text not returned by match, i.e., what remains if you split the string based on pat.
Particular output arguments, or the order of the output arguments, can be selected by additional opt arguments. These are strings and the correspondence between the output arguments and the optional argument are
'start' | s | ||
'end' | e | ||
'tokenExtents' | te | ||
'match' | m | ||
'tokens' | t | ||
'names' | nm | ||
'split' | sp |
Additional arguments are summarized below.
Return only the first occurrence of the pattern.
Make the matching case sensitive. (default)
Alternatively, use (?-i) in the pattern.
Ignore case when matching the pattern to the string.
Alternatively, use (?i) in the pattern.
Match the anchor characters at the beginning and end of the string. (default)
Alternatively, use (?-m) in the pattern.
Match the anchor characters at the beginning and end of the line.
Alternatively, use (?m) in the pattern.
The pattern .
matches all characters including the newline character.
(default)
Alternatively, use (?s) in the pattern.
The pattern .
matches all characters except the newline character.
Alternatively, use (?-s) in the pattern.
All characters in the pattern, including whitespace, are significant and are used in pattern matching. (default)
Alternatively, use (?-x) in the pattern.
The pattern may include arbitrary whitespace and also comments beginning with the character ‘#’.
Alternatively, use (?x) in the pattern.
Zero-length matches are not returned. (default)
Return zero-length matches.
regexp ('a', 'b*', 'emptymatch')
returns [1 2]
because there
are zero or more 'b'
characters at positions 1 and end-of-string.
See also: regexpi, strfind, regexprep.
Package: octave