31

In the above xml sample I would like to select all the books that belong to class foo and not in class bar by using xpath.

<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
  <book class="foo">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book class="foo bar">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book class="foo bar">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
</bookstore>
1
  • 2
    Good question, +1. See my answer for two different XPath 2.0 solutions of which the first might be the most efficient of them all especially with a non-optimizing XPath 2.0 engine. Commented Apr 17, 2011 at 1:11

3 Answers 3

39

By padding the @class value with leading and trailing spaces, you can test for the presence of " foo " and " bar " and not worry about whether it was first, middle, or last, and any false positive hits on "food" or "barren" @class values:

/bookstore/book[contains(concat(' ',@class,' '),' foo ')
        and not(contains(concat(' ',@class,' '),' bar '))]
Sign up to request clarification or add additional context in comments.

2 Comments

What if @class contains tab or even new-line character instead of space. Here comes handy the normalize-space function (XPath 1.0) that strips the leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, e.g. concat(' ',normalize-space(@class),' ')
@Steven Pribilinskiy - That should not be necessary. Due to how attribute values are normalized by the XML parser, tabs and carriage returns will have already been normalized into a space. w3.org/TR/xml/#AVNormalize
11

Although I like Mads solution: Here is another approach for XPath 2.0:

/bookstore/book[
                 tokenize(@class," ")="foo" 
                 and not(tokenize(@class," ")="bar")
               ]

Please note that the following expressions are both true:

("foo","bar")="foo" -> true
("foo","bar")="bar" -> true

1 Comment

+1 for the XPath 2.0 solution. So many things are easier with 2.0.
4

XPath 2.0:

/*/*[for $s in concat(' ',@class,' ') 
            return 
               matches($s, ' foo ') 
             and 
              not(matches($s, ' bar '))
      ]

Here no tokenization is done and $s is calculated only once.

Or even:

/*/book[@class
          [every $t in tokenize(.,' ') satisfies $t ne 'bar']
          [some  $t in tokenize(.,' ') satisfies $t eq 'foo']
       ]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.