0

The following XML structure represents a website with many articles. Every article contains, among many other things, date of its creation and possibly arbitrarily many dates of its modification. I want to get the date of the last access (either creation or last modification) to every article using XPath 1.0.

<website>
    <article>
        <date><strong>22.11.2017</strong></date>
        <edits>
            <edit><strong>17.12.2017</strong></edit>
        </edits>
    </article>
    <article>
        <date><strong>17.4.2016</strong></date>
        <edits></edits>
    </article>
    <article>
        <date><strong>3.5.2011</strong></date>
        <edits>
            <edit><strong>4.5.2011</strong></edit>
            <edit><strong>12.8.2012</strong></edit>
        </edits>
    </article>
    <article>
        <date><strong>12.2.2009</strong></date>
        <edits></edits>
    </article>
    <article>
        <date><strong>23.11.1987</strong></date>
        <edits>
            <edit><strong>3.4.2001</strong></edit>
            <edit><strong>11.5.2006</strong></edit>
            <edit><strong>13.9.2012</strong></edit>
        </edits>
    </article>
</website>

In other words, the expected output is:

<strong>17.12.2017</strong>
<strong>17.4.2016</strong>
<strong>12.8.2012</strong>
<strong>12.2.2009</strong>
<strong>13.9.2012</strong>

So far I've only created this path:

//article/*[self::date or self::edits/edit][last()]

that looks for date and nonempty edits nodes in every article and selects the latter one. But I don't know how to access the latest strong of every such selection and the naive //strong[last()] appended to the end of the path doesn't work.

I found a solution in XPath 2.0. Either of these paths should work, if I'm not mistaken:

//article/(*[self::date or self::edits/edit][last()]//strong)[last()]
//article/(*//strong)[last()]

Such use of parentheses within path is invalid in XPath 1.0 though.

2 Answers 2

1

This XPath 1.0 expression

/website/article/descendant::strong[parent::date|parent::edit][last()]

Selects the nodes:

<strong>17.12.2017</strong>

<strong>17.4.2016</strong>

<strong>12.8.2012</strong>

<strong>12.2.2009</strong>

<strong>13.9.2012</strong>

Tested in http://www.xpathtester.com/xpath/56d8f7bc4b9c8c064fdad16f22469026

Do note: position predicates acts over the context list.

Sign up to request clarification or add additional context in comments.

2 Comments

Very clever! Thank you. By the way, I always expected the abbreviated syntax to give the same results as the full one, yet now I realized that the path /website/article//strong[parent::date|parent::edit][last()] selects completely different nodes.
@Jeyekomon You are welcome. The abbreviated syntax will be expanded to /child::website/child::article/descendant-or-self::node()/child::strong[position()=last()] so, here position() refers to the child axe.
1

Here is the simple xpath to get your output.

//article/descendant-or-self::strong[last()]

2 Comments

Curious to know, if this solves the issue. Though I checked from end, just want to check with you as you might have the entire structure.
Also correct answer. You're using the same trick as Alejandro with the unabbreviated syntax which did not come to my mind. I was stuck to only abbreviated one which was clearly faulty and probably unsolvable that way. Alejandro's solution is a bit safer though because he included the test for parents. The strong node can be relatively frequent within the article.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.