1

I want to find the path to a given XML element (node). I have tried xmllint and xml_grep; both return the element(s) I'm searching for, but as far as I can tell, neither returns the path to that element.

Example XML:

<myroot>
   <my2ndlvl>
      <my3rdlvl foo="bar"/>
   </my2ndlvl>
   <my2ndlvl>
      <my3rdlvl fum="baz"/>
   </my2ndlvl>
</myroot>

I've tried a number of variants of
xmllint --xpath '//my3rdlvl[@fum="baz"]'

and

xml_grep --cond '//my3rdlvl[@fum="baz"]'

but both just return the node <my3rdlvl fum="baz"/> (xml_grep wraps the node in its own <xml_grep ...> node, but that's no help). What I want to get back is something like

myroot/my2ndlvl/my3rdlvl[@fum="baz"]

or an XML representation of that (without any nodes not on the xpath).

How can I find this path? Is there a way to make xmllint or xml_grep do it?

0

3 Answers 3

1

Here is a pure XSLT 1.0 transformation that produces the paths to all leaf elements in an XML documents (see also: Generate/get Xpath from XML in Java) :

<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:variable name="vApos">'</xsl:variable>
    
    <xsl:template match="*[@* or not(*)] ">
      <xsl:if test="not(*)">
         <xsl:apply-templates select="ancestor-or-self::*" mode="path"/>
         <xsl:value-of select="concat('=',$vApos,.,$vApos)"/>
         <xsl:text>&#xA;</xsl:text>
        </xsl:if>
        <xsl:apply-templates select="@*|*"/>
    </xsl:template>
    
    <xsl:template match="*" mode="path">
        <xsl:value-of select="concat('/',name())"/>
        <xsl:variable name="vnumPrecSiblings" select=
         "count(preceding-sibling::*[name()=name(current())])"/>
        <xsl:if test="$vnumPrecSiblings">
            <xsl:value-of select="concat('[', $vnumPrecSiblings +1, ']')"/>
        </xsl:if>
    </xsl:template>
    
    <xsl:template match="@*">
        <xsl:apply-templates select="../ancestor-or-self::*" mode="path"/>
        <xsl:value-of select="concat('[@',name(), '=',$vApos,.,$vApos,']')"/>
        <xsl:text>&#xA;</xsl:text>
    </xsl:template>
</xsl:stylesheet>

When applied on the provided XML document:

<myroot>
   <my2ndlvl>
      <my3rdlvl foo="bar"/>
   </my2ndlvl>
   <my2ndlvl>
      <my3rdlvl fum="baz"/>
   </my2ndlvl>
</myroot>

the wanted output: a sequence of XPath expressions for every element in the document, is produced:

/myroot/my2ndlvl/my3rdlvl=''
/myroot/my2ndlvl/my3rdlvl[@foo='bar']
/myroot/my2ndlvl[2]/my3rdlvl=''

/myroot/my2ndlvl[2]/my3rdlvl[@fum='baz']

Do note: The XPath expression to the wanted element contains a comparison for its attribute, exactly as wanted.

One can easily modify this transformation to accept as parameter a single node (element) and produce the XPath expression to only this element.

Thus, after editing the above transformation we arrive at this:

<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:param name="pNode" select="/myroot/my2ndlvl/my3rdlvl[@fum='baz']"/>
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:variable name="vApos">'</xsl:variable>
    
    <xsl:template match="*">
      <xsl:if test="descendant-or-self::*[count(. | $pNode) = 1]">
        <xsl:value-of select="concat('/',name())"/>
        <xsl:variable name="vnumPrecSiblings" select=
         "count(preceding-sibling::*[name()=name(current())])"/>
        <xsl:variable name="vnumFollowSiblings" select=
         "count(following-sibling::*[name()=name(current())])"/>
         
        <xsl:if test="$vnumPrecSiblings + $vnumFollowSiblings > 0">
          <xsl:value-of select="concat('[', $vnumPrecSiblings +1, ']')"/>
        </xsl:if>

        <xsl:apply-templates select="* | @*"/>      
      </xsl:if>
    </xsl:template>
    
    <xsl:template match="@*">
        <xsl:value-of select="concat('[@',name(), '=',$vApos,.,$vApos,']')"/>  
    </xsl:template>   
</xsl:stylesheet>

And this produces exactly the wanted, correct result: a full and unambiguous XPath expression that selects exactly the node passed as parameter to the transformation, and explicitly tests/disambiguates using the attribute value of this element. The result is:

/myroot/my2ndlvl[2]/my3rdlvl[@fum='baz']

Not only this, but the generated XPath expression tests for all attributes of the element, if it has more than one attribute.

Modify the original XML document so that now the wanted element has two attributes: fum and test:

<myroot>
   <my2ndlvl>
      <my3rdlvl foo="bar"/>
   </my2ndlvl>
   <my2ndlvl>
      <my3rdlvl fum="baz" test="xyz"/>
   </my2ndlvl>
</myroot>

Applying the above transformation on this document now produces:

/myroot/my2ndlvl[2]/my3rdlvl[@fum='baz'][@test='xyz']

Isn't this wonderful ... !

Sign up to request clarification or add additional context in comments.

1 Comment

Yes, that is nice! It also taught me things I didn't know (or remember...) about XPaths.
0

There are 2 tools1 that can produce xpath expressions from XML or HTML documents

xml2xpath.sh

# quiet (-q), absolute paths (-a), starting at expression (-s)
xml2xpath.sh -q -a -s '//my3rdlvl[@fum="baz"]' ~/tmp/tmp2.xml 
/myroot/my2ndlvl[2]/my3rdlvl
/myroot/my2ndlvl[2]/my3rdlvl/@fum

Test found expressions

f='/home/lmc/tmp/tmp2.xml'
for x in $(xml2xpath.sh -q -a -s '//my3rdlvl[@fum="baz"]' "$f" | grep -v '^$');do
    xmllint --xpath "$x" "$f"
done

Result:

<my3rdlvl fum="baz"/>
 fum="baz"

pyxml2xpath

# pyxml2xpath <file path> [mode] [initial xpath expression] [with element count: yes|true] [max elements: int] [no banner: yes|true]
pyxml2xpath ~/tmp/tmp2.xml xpath '//my3rdlvl[@fum="baz"]' false 10 true
/myroot/my2ndlvl[2]/my3rdlvl

[1] Disclaimer: I'm the author of both tools.

2 Comments

Thanks, that's what I needed! (I think there's a typo in the -h info: it says the possibilities for the 'mode' arg include 'path', but it should be 'xpath' (the error msg correctly says 'xpath'). Also, it would be nice if it would accept input from stdin, so it could be used in a pipe. Granted, piping XML can result in weird stuff, but still.)
Thanks for the typo report! I will open an issue to read from sdtin if possible.
0

With Saxon Gizmo [my company's product]

java -cp <path to Saxon> net.sf.saxon.Gizmo -s source.xml 
/>list //my3rdlvl[@fum="baz"]
/>quit

output:

/myroot/my2ndlvl[2]/my3rdlvl[1]

Getting back a path like myroot/my2ndlvl/my3rdlvl[@fum="baz"] is rather challenging, because of course there are any number of XPaths that select this element and there's no way of knowing which one you want. For example you might prefer //*[@fum] or even (//*)[5].

6 Comments

Re: "Getting back a path like myroot/my2ndlvl/my3rdlvl[@fum="baz"] is rather challenging" Not at all - see my answer :)
I meant that it's challenging to infer the precise requirements when all you are given is a single example. Obviously if you have a clear specification then it's not difficult.
While specifying just the position of an element ensures that the XPath expression is unique, the OP @mike-maxwell is right that having attribute predicates makes the XPath expression much more readable and understandable. Maybe an improvement for Saxon Gizmo? :) Also, it isn't necessary to provide the position as in .../my3rdlvl[1] because this is the only child. Again, this could be improved in the XPath expression produced now by Saxon Gizmo, which you provided.
"makes the XPath expression much more readable and understandable". Well, there's the danget that the user types list XPATH and the system respnonds with the exact XPATH that the user entered. How do we know what the user understands? It might be that the reason the user is here in the first place is that they didn't understand why this XPath produced the result it did. There is no right answer to this one.
" How do we know what the user understands? " We certainly know that for any human being this: /a/b[2]/c[33] is much less understandable than: /a/b[2]/c[33][@age = 30][@name='George'] . Not forcing them to count 33 children is a good progress, isn't it?
Yeah, I didn't think about the other XPaths that could select the target node. I did want the one that showed all the nodes in between, although that still doesn't determine whether you see attrs on those node and/or on the target nodes. I guess I wanted the most informative XPath, which for me would be all the intermediate nodes and all their attrs, and the node and attrs of the target.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.