Chapter 18. Regular Expressions
Regular expressions are patterns that describe strings. They can be used as arguments to three XQuery built-in functions to determine whether a string value matches a particular pattern (matches), to replace parts of string that match a pattern (replace), and to tokenize strings based on a delimiter pattern (tokenize). This chapter explains the regular expression syntax used by XQuery.
The Structure of a Regular Expression
The regular expression syntax of XQuery is based on that of XML Schema, with some additions. Regular expressions, also known as regexes, can be composed of a number of different parts: atoms, quantifiers, and branches.
Atoms
An atom is the most basic unit of a regular expression. It might describe a single character, such as d, or an escape sequence that represents one or more characters, like \s or \p{Lu}. It could also be a character class expression that represents a range or choice of several characters, such as [a-z]. These kinds of atoms are described later in this chapter.
Quantifiers
Atoms may indicate required, optional, or repeating strings. The number of times a matching string may appear is indicated by a quantifier, which appears directly after an atom. For example, to indicate that the letter d must appear one or more times, you can use the expression d+, where the + means "one or more." The different quantifiers are listed in Table 18-1.
Table 18-1. Kinds of quantifiers
|
Quantifier |
Number of occurrences |
|---|---|
|
|
1 |