2

I have the following code

        <?php
        $doc = new DOMDocument;
        $doc->loadhtml('<html>
                       <head> 
                        <title>bar , this is an example</title> 
                       </head> 
                       <body> 
                       <h1>latest news</h1>
                       foo <strong>bar</strong> 
                      <i>foobar</i>
                       </body>
                       </html>');


        $xpath = new DOMXPath($doc);
        foreach($xpath->query('//*[contains(child::text(),"bar")]') as $e) {
              echo $e->tagName, "\n";
        }

Prints

       title
       strong
       i

this code finds any HTML element that contains the word "bar" and it matches words that has "bar" like "foobar" I want to change the query to match only the word "bar" without any prefix or postfix

I think it can be solved by changing the query to search for every "bar" that has not got a letter after or before or has a space after or before

this code from a past question here by VolkerK

Thanks

1

2 Answers 2

2

You can use the following XPath Query

$xpath->query("//*[text()='bar']");

or

$xpath->query("//*[.='bar']");

Note using the "//" will slow things down, the bigger you XML file is.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks but this does not work, it prints: "strong" whilst it should prints "strong" and "title" because the word "bar" is in the title as well
I thought you just wanted to match just "bar" now I see you want it to match "bar" or "this bar now" but not "this foobar now".
2

If you are looking for just "bar" with XPath 1.0 then you'll have to use a combo of functions, there are no regular expressions in XPath 1.0.

$xpath->query("//*[
                starts-with(., 'bar') or 
                contains(., ' bar ') or  
                ('bar' = substring(.,string-length(.)-string-length('bar')+1))
              ]");

Basically this is saying locate strings that start-with 'bar' or contains ' bar ' (notice the spaces before and after) or ends-with 'bar' (notice that ends-with is an XPath 2.0 function, so I substituted code which emulates that function from a previous Stackoverflow Answer.)

if the contains ' bar ' is not enough, because you may have "one bar, over" or "This bar. That bar." where you may have other punctuation after the 'bar'. You could try this contains instead:

contains(translate(., '.,[]', ' '), ' bar ') or

That translates any '.,[]' to a ' ' (single space)... so "one bar, over" becomes "one bar over", thus would match " bar " as expected.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.