91

I want to select just a class on its own called .date

For some reason, I cannot get this to work. If anyone knows what is wrong with my code, it would be much appreciated.

@$doc = new DOMDocument();
@$doc->loadHTML($html);
$xml = simplexml_import_dom($doc); // just to make xpath more simple
$images = $xml->xpath('//[@class="date"]');                             
foreach ($images as $img)
{
    echo  $img." ";
}
9
  • 2
    and what about piece of html ? ( Prefer to show us simpleXml output from asXML() as it is nearer to xpath ) Commented Jan 10, 2012 at 19:03
  • if there is multiple classes you need to do contains(@class, 'date') Commented Jan 10, 2012 at 19:04
  • possible duplicate of PHP - Parse All Links That Contain A Speciffic Word In "href" Tag Commented Jan 10, 2012 at 19:09
  • possible duplicate of XPath: How to match attributes that contain a certain string Commented Jun 13, 2012 at 17:34
  • @Gordon's answer is dangerous, if the class attribute is "datetime" it would also match. user716736's answer is more complete. Commented Oct 12, 2012 at 13:25

6 Answers 6

257

I want to write the canonical answer to this question because the answer above has a problem.

Our problem

The CSS selector:

.foo

will select any element that has the class foo.

How do you do this in XPath?

Although XPath is more powerful than CSS, XPath doesn't have a native equivalent of a CSS class selector. However, there is a solution.

The right way to do it

The equivalent selector in XPath is:

//*[contains(concat(" ", normalize-space(@class), " "), " foo ")]

The function normalize-space strips leading and trailing whitespace (and also replaces sequences of whitespace characters by a single space).

(In a more general sense) this is also the equivalent of the CSS selector:

*[class~="foo"]

which will match any element whose class attribute value is a list of whitespace-separated values, one of which is exactly equal to foo.

A couple of obvious, but wrong ways to do it

The XPath selector:

//*[@class="foo"]

doesn't work! because it won't match an element that has more than one class, for example

<div class="foo bar">

It also won't match if there is any extra whitespace around the class name:

<div class="  foo ">

The 'improved' XPath selector

//*[contains(@class, "foo")]

doesn't work either! because it wrongly matches elements with the class foobar, for example

<div class="foobar">

Credit goes to this fella, who was the earliest published solution to this problem that I found on the web: http://dubinko.info/blog/2007/10/01/simple-parsing-of-space-seprated-attributes-in-xpathxslt/

Sign up to request clarification or add additional context in comments.

7 Comments

What's the need for normalize-space?
"the answer above" probably refers to MrGlass's.
Is this possible <div class="foo\tbar">? I mean, class names separated by a tab.
but <div class="group-conditions"/> and <div class="condition"/> is the same for $x('//div[contains(concat(" ", normalize-space(@class), " "), "condition")]')
@testerjoe2 did you try //*[contains(concat(" ", normalize-space(@class), " "), " foo ")] ?
|
13

//[@class="date"] is not a valid xpath.

Try //*[@class="date"], or if you know it is an image, //img[@class="date"]

Comments

7

XPath 3.1 introduces a function contains-token and thus finally solves this ‘officially’. It is designed to support classes.

Example:

//*[contains-token(@class, "foo")]

This function makes sure that white space (not only (U+0020)) is handled correctly, works in case of class name repetition, and generally covers the edge cases.


Note: As of today (2016-12-13) XPath 3.1 has status of Candidate Recommendation.

1 Comment

It does not work in today's latest chrome. Until it works, how do we get around the limitation that //*[contains(@class, "foo")] will also select any class that contains foo, such as foobar, fooz etc.
3

In XPath 2.0 you can:

//*[count(index-of(tokenize(@class, '\s+' ), 'foo')) = 1]

as stated by Christian Weiske in: https://cweiske.de/tagebuch/XPath%3A%20Select%20element%20by%20class.htm

1 Comment

unfortunately this doesn't seem to be implemented by chrome as of 6/12/2017. based on en.wikipedia.org/wiki/… it seems to be lacking pretty much across the board
1

HTML allows case-insensitive element and attribute names and then class is a space separated list of class-names. Here we go for a img tag and the class named date:

//*['IMG' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')]/@*['CLASS' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') and contains(concat(' ', normalize-space(.), ' '), concat(' ', 'date', ' '))]

See as well: CSS Selector to XPath conversion

Comments

1

BEWARE OF MINUS SIGNS IN TEMPLATE !!! If you are querying for "my-ownclass" in DOM:

<ul class="my-ownclass"><li>...</li></ul>
<ul class="someother"><li>...</li></ul>
<ul><li>...</li></ul>

$finder = new DomXPath($dom);
$nodes = $finder->query(".//ul[contains(@class, 'my-ownclass')]"); // This will NOT behave as expected! This will strangely match all the <ul> elements in DOM.
$nodes = $finder->query(".//ul[contains(@class, 'ownclass')]"); // This will match the element.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.