book

XQuery

by Priscilla Walmsley

March 2007

Intermediate to advanced

512 pages

21h 15m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Contents of This Book
1.1. What Is XQuery?1.1.1. Capabilities of XQuery1.1.2. Uses for XQuery1.1.3. Processing Scenarios

1.5.1. Adding Elements1.5.2. Adding Attributes
2.1. The Design of the XQuery Language
2.2.1. XQuery and XPath2.2.2. XQuery Versus XSLT2.2.3. XQuery Versus SQL2.2.4. XQuery and XML Schema
2.3.1. XML Input Documents2.3.2. The Query2.3.3. The Context2.3.4. The Query Processor2.3.5. The Results of the Query
2.4.1. Nodes2.4.1.1. Node kinds2.4.1.2. The node hierarchy2.4.1.3. The node family2.4.1.4. Roots, documents, and elements2.4.1.5. Node identity and name2.4.1.6. String and typed values of nodes2.4.2. Atomic Values2.4.3. Sequences
3.1. Categories of Expressions
3.9.1. General Comparisons3.9.1.1. General comparisons on multi-item sequences3.9.1.2. General comparisons and types3.9.2. Value Comparisons3.9.3. Node Comparisons
3.10.1. Conditional Expressions and Effective Boolean Values3.10.2. Nesting Conditional Expressions
3.11.1. Evaluation Order of Logical Expressions3.11.2. Negating a Boolean Value
4.1. Path Expressions4.1.1. Path Expressions and Context4.1.1.1. Steps and changing context4.1.2. Steps4.1.3. Axes4.1.4. Node Tests4.1.4.1. Node name tests4.1.4.2. Node name tests and namespaces4.1.4.3. Node name tests and wildcards4.1.4.4. Node kind tests4.1.5. Abbreviated Syntax4.1.6. Other Expressions As Steps
4.2.1. Comparisons in Predicates4.2.2. Using Positions in Predicates4.2.2.1. Understanding positional predicates4.2.2.2. The position and last functions4.2.2.3. Positional predicates and reverse axes4.2.3. Using Multiple Predicates4.2.4. More Complex Predicates
4.4.1. Accessing a Single Document4.4.2. Accessing a Collection4.4.3. Setting the Context Node Outside the Query4.4.4. Using Variables
4.5.1. Working with the Context Node4.5.2. Accessing the Root
5.1. Including Elements and Attributes from the Input Document
5.2.1. Containing Literal Characters5.2.2. Containing Other Element Constructors5.2.3. Containing Enclosed Expressions5.2.3.1. Enclosed expressions that evaluate to elements5.2.3.2. Enclosed expressions that evaluate to attributes5.2.3.3. Enclosed expressions that evaluate to atomic values5.2.3.4. Enclosed expressions with multiple subexpressions5.2.4. Specifying Attributes Directly5.2.5. Declaring Namespaces in Direct Constructors5.2.6. Use Case: Modifying an Element from the Input Document5.2.7. Direct Element Constructors and Whitespace5.2.7.1. Boundary whitespace5.2.7.2. The boundary-space declaration5.2.7.3. Forcing boundary whitespace preservation
5.3.1. Computed Element Constructors5.3.1.1. Names of computed element constructors5.3.1.2. Content of computed element constructors5.3.2. Computed Attribute Constructors5.3.3. Use Case: Turning Content to Markup
6.1. Selecting with Path Expressions
6.2.1. The for Clause6.2.1.1. Range expressions6.2.1.2. Multiple for clauses6.2.2. The let Clause6.2.3. The where Clause6.2.4. The return Clause6.2.5. The Scope of Variables
6.3.1. Binding Multiple Variables
6.5.1. Three-Way Joins6.5.2. Outer Joins6.5.3. Joins and Types
7.1. Sorting in XQuery7.1.1. The order by Clause7.1.1.1. Using multiple ordering specifications7.1.1.2. Sorting and types7.1.1.3. Order modifiers7.1.1.4. Empty order7.1.1.5. Stable ordering7.1.1.6. More complex order specifications7.1.2. Document Order7.1.2.1. Document order defined7.1.2.2. Sorting in document order7.1.2.3. Inadvertent resorting in document order7.1.3. Order Comparisons7.1.4. Reversing the Order7.1.5. Indicating That Order Is Not Significant7.1.5.1. The unordered function7.1.5.2. The unordered expression7.1.5.3. The ordering mode declaration
7.3.1. Ignoring "Missing" Values7.3.2. Counting "Missing" Values7.3.3. Aggregating on Multiple Values7.3.4. Constraining and Sorting on Aggregated Values
8.1. Built-in Versus User-Defined Functions
8.2.1. Function Names8.2.2. Function Signatures8.2.3. Argument Lists8.2.3.1. Argument lists and the empty sequence8.2.3.2. Argument lists and sequences8.2.4. Sequence Types
8.3.1. Why Define Your Own Functions?8.3.2. Function Declarations8.3.3. The Function Body8.3.4. The Function Name8.3.5. The Parameter List8.3.5.1. Accepting arguments that are nodes versus atomic values8.3.5.2. Accepting arguments that are the empty sequence8.3.6. Functions and Context8.3.7. Recursive Functions
9.1. Copying Input Elements with Modifications9.1.1. Adding Attributes to an Element9.1.2. Removing Attributes from an Element9.1.3. Removing Attributes from All Descendants9.1.4. Removing Child Elements9.1.5. Changing Names
9.2.1. Adding Sequence Numbers to Results9.2.2. Testing for the Last Item
9.3.1. Sequence Constructors9.3.2. The union Expression9.3.3. The intersect Expression9.3.4. The except Expression
9.4.1. Creating Lookup Tables9.4.2. Reducing Complexity
10.1. XML Namespaces10.1.1. Namespace URIs10.1.2. Declaring Namespaces10.1.3. Default Namespace Declarations10.1.4. Namespaces and Attributes10.1.5. Namespace Declarations and Scope
10.3.1. Predeclared Namespaces10.3.2. Prolog Namespace Declarations10.3.2.1. Default namespace declarations in the prolog10.3.2.2. The default function namespace declaration10.3.2.3. Other prolog namespace declarations10.3.3. Namespace Declarations in Element Constructors10.3.4. The Impact and Scope of Namespace Declarations10.3.4.1. Scope of namespace declarations10.3.4.2. Names affected by namespace declarations10.3.4.3. Namespace declarations and input elements
10.4.1. In-Scope Versus Statically Known Namespaces10.4.2. Controlling the Copying of Namespace Declarations
11.1. The XQuery Type System11.1.1. Advantages of a Strong Type System11.1.2. Do You Need to Care About Types?
11.3.1. Nodes and Types11.3.2. Atomic Values and Types
11.4.1. The Static Analysis Phase11.4.2. The Dynamic Evaluation Phase
11.5.1. Subtype Substitution11.5.2. Type Promotion11.5.3. Casting of Untyped Values11.5.4. Atomization11.5.5. Effective Boolean Value11.5.6. Function Conversion Rules
11.6.1. Occurrence Indicators11.6.2. Generic Sequence Types11.6.3. Atomic Type Names As Sequence Types11.6.4. Element and Attribute Tests11.6.5. Sequence Type Matching11.6.6. The "instance of" Expression
11.7.1. Constructors11.7.2. The Cast Expression11.7.3. The Castable Expression11.7.4. Casting Rules11.7.4.1. Casting among the primitive types11.7.4.2. Casting to xs:string or xs:untypedAtomic11.7.4.3. Casting to xs:string or xs:untypedAtomic11.7.4.4. Casting among derived types
12.1. Structure of a Query: Prolog and Body12.1.1. Prolog Declarations12.1.2. The Version Declaration
12.2.1. Library Modules12.2.2. Importing a Library Module12.2.2.1. Multiple module imports12.2.2.2. The behavior of a module import
12.3.1. Variable Declaration Syntax12.3.2. The Scope of Variables12.3.3. Variable Names12.3.4. Initializing Expressions12.3.5. External Variables
13.1. What Is a Schema?
13.3.1. Element and Attribute Declarations13.3.2. Types13.3.2.1. Simple and complex types13.3.2.2. User-defined types13.3.2.3. List types13.3.3. Namespaces and XML Schema
13.4.1. Where Do In-Scope Schema Definitions Come from?13.4.2. Schema Imports13.4.2.1. Importing a schema with no target namespace13.4.2.2. Importing multiple schemas with the same target namespace13.4.2.3. Schema imports and library modules
13.5.1. The Validate Expression13.5.2. Validation Mode13.5.3. Assigning Type Annotations to Nodes13.5.4. Nodes and Typed Values13.5.5. Types and Newly Constructed Elements and Attributes
14.1. What Is Static Typing?14.1.1. Obvious Static Type Errors14.1.2. Static Typing and Schemas14.1.3. Raising "False" Errors14.1.4. Static Typing Expressions and Constructs
14.4.1. Type Declarations in FLWORs14.4.2. Type Declarations in Quantified Expressions14.4.3. Type Declarations in Global Variable Declarations
15.1. Query Design Goals
15.2.1. Improving the Layout15.2.2. Choosing Names15.2.3. Using Comments for Documentation
15.4.1. Handling Data Variations15.4.2. Handling Missing Values15.4.2.1. Absent values15.4.2.2. Empty and nil values15.4.2.3. Default "missing" values
15.5.1. Avoiding Dynamic Errors15.5.2. The error and trace Functions
15.6.1. Avoid Reevaluating the Same or Similar Expressions15.6.2. Avoid Unnecessary Sorting15.6.3. Avoid Expensive Path Expressions15.6.4. Use Predicates Instead of where Clauses
16.1. The Numeric Types16.1.1. The xs:decimal Type16.1.2. The xs:integer Type16.1.3. The xs:float and xs:double Types
16.2.1. The number Function16.2.2. Numeric Type Promotion
16.4.1. Arithmetic Operations on Multiple Values16.4.2. Arithmetic Operations and Types16.4.3. Precedence of Arithmetic Operators16.4.4. Addition, Subtraction, and Multiplication16.4.5. Division16.4.6. Modulus (Remainder)
17.1. The xs:string Type
17.2.1. String Literals17.2.2. The xs:string Constructor and the string Function
17.3.1. Comparing Entire Strings17.3.2. Determining Whether a String Contains Another String17.3.3. Matching a String to a Pattern
17.6.1. Concatenating Strings17.6.2. Splitting Strings Apart17.6.3. Converting Between Code Points and Strings
17.7.1. Converting Between Uppercase and Lowercase17.7.2. Replacing Individual Characters in Strings17.7.3. Replacing Substrings That Match a Pattern
17.8.1. Normalizing Whitespace
17.9.1. Collations17.9.2. Unicode Normalization17.9.3. Determining the Language of an Element
18.1. The Structure of a Regular Expression18.1.1. Atoms18.1.2. Quantifiers18.1.3. Parenthesized Sub-Expressions and Branches
18.4.1. Multi-Character Escapes18.4.2. Category Escapes18.4.3. Block Escapes
18.5.1. Single Characters and Ranges18.5.2. Subtraction from a Range18.5.3. Negative Character Class Expressions18.5.4. Escaping Rules for Character Class Expressions
18.7.1. Anchors and Multi-Line Mode
19.1. The Date and Time Types19.1.1. Constructing and Casting Dates and Times19.1.2. Time Zones19.1.2.1. Explicit versus implicit time zones19.1.2.2. Adjusting time zones19.1.2.3. Finding the time zone of a value19.1.3. Comparing Dates and Times
19.2.1. The yearMonthDuration and dayTimeDuration Types19.2.2. Comparing Durations
19.4.1. Subtracting Dates and Times19.4.2. Adding and Subtracting Durations from Dates and Times19.4.3. Adding and Subtracting Two Durations19.4.4. Multiplying and Dividing Durations by Numbers19.4.5. Dividing Durations by Durations
20.1. Working with Qualified Names20.1.1. Retrieving Node Names20.1.2. Constructing Qualified Names20.1.3. Other Name-Related Functions
20.2.1. Base and Relative URIs20.2.1.1. Using the xml:base attribute20.2.1.2. Finding the base URI of a node20.2.1.3. Resolving URIs20.2.1.4. The base URI of the static context20.2.2. Documents and URIs20.2.2.1. Finding the URI of a document20.2.2.2. Opening a document from a dynamic value20.2.3. Escaping URIs
20.3.1. Joining IDs and IDREFs20.3.2. Constructing IDs
21.1. XML Comments21.1.1. XML Comments and the Data Model21.1.2. Querying Comments21.1.3. Comments and Sequence Types21.1.4. Constructing Comments
21.2.1. Processing Instructions and the Data Model21.2.2. Querying Processing Instructions21.2.3. Processing Instructions and Sequence Types21.2.4. Constructing Processing Instructions
21.3.1. Document Nodes and the Data Model21.3.2. Document Nodes and Sequence Types21.3.3. Constructing Document Nodes
21.4.1. Text Nodes and the Data Model21.4.2. Querying Text Nodes21.4.3. Text Nodes and Sequence Types21.4.4. Why Work with Text Nodes?21.4.5. Constructing Text Nodes
22.1. Serialization
23.1. Conformance
23.4.1. The Option Declaration23.4.2. Extension Expressions
24.1. Relational Versus XML Data Models
24.2.1. A Simple Query24.2.2. Conditions and Operators24.2.2.1. Comparisons24.2.2.2. Arithmetic and string operators24.2.2.3. Boolean operators24.2.3. Functions24.2.4. Selecting Distinct Values24.2.5. Working with Multiple Tables and Subqueries24.2.5.1. Subselects24.2.5.2. Combining queries using set operators24.2.6. Grouping
24.3.1. Combining Structured and Semistructured Data24.3.2. Flexible Data Structures
25.1. XQuery and XPath
25.2.1. Shared Components25.2.2. Equivalent Components25.2.3. Differences25.2.3.1. Paradigm differences: push versus pull25.2.3.2. Optimization for particular use cases25.2.3.3. Convenient features of XSLT
25.3.1. Data Model25.3.2. New Expressions25.3.3. Path Expressions25.3.4. Function Conversion Rules25.3.5. Arithmetic and Comparison Expressions25.3.6. Built-in Functions
B.1. xs:anyAtomicType
B.3.1. Casting and Comparing xs:anyURI Values
B.5.1. Constructing xs:boolean ValuesB.5.2. Casting xs:boolean Values
B.10.1. Casting xs:decimal Values
B.11.1. Casting xs:double Values
B.15.1. Casting xs:float Values
B.21.1. Casting and Comparing xs:hexBinary Values
B.26.1. Casting xs:integer Values
C.1. FOAR0001

Content preview from XQuery

Chapter 18. Regular Expressions

Regular expressions are patterns that describe strings. They can be used as arguments to three XQuery built-in functions to determine whether a string value matches a particular pattern (matches), to replace parts of string that match a pattern (replace), and to tokenize strings based on a delimiter pattern (tokenize). This chapter explains the regular expression syntax used by XQuery.

The Structure of a Regular Expression

The regular expression syntax of XQuery is based on that of XML Schema, with some additions. Regular expressions, also known as regexes, can be composed of a number of different parts: atoms, quantifiers, and branches.

Atoms

An atom is the most basic unit of a regular expression. It might describe a single character, such as d, or an escape sequence that represents one or more characters, like \s or \p{Lu}. It could also be a character class expression that represents a range or choice of several characters, such as [a-z]. These kinds of atoms are described later in this chapter.

Quantifiers

Atoms may indicate required, optional, or repeating strings. The number of times a matching string may appear is indicated by a quantifier, which appears directly after an atom. For example, to indicate that the letter d must appear one or more times, you can use the expression d+, where the + means "one or more." The different quantifiers are listed in Table 18-1.

Table 18-1. Kinds of quantifiers