0

I would like to use an xpath to get a list of list (or sequence of sequence) that groups extracted xml tags by parent element in order.

Here are my attempts so far using a minimal example..

import elementpath, lxml.etree
xml = '''<a>
<b c="1">
  <d e="3"/>
  <d e="4"/>
</b>
<b c="2">
  <d e="5"/>
  <d e="6"/>
</b>
</a>'''
tree = lxml.etree.fromstring(str.encode(xml))
xpath1 = '/a/b/d/@e'
xpath2 = 'for $b in (/a/b) return concat("[", $b/string-join(d/@e, ", "), "]")'
print('1:', elementpath.select(tree, xpath1))
print('2:', elementpath.select(tree, xpath2))
print('3:', [['3', '4'], ['5', '6']])

Which outputs..

1: ['3', '4', '5', '6']
2: ['[3, 4]', '[5, 6]']
3: [['3', '4'], ['5', '6']]

xpath1 returns a flattened list/sequence, with no grouping by parent element.

xpath2 is the closest I have come so far, but gives sub-arrays as string rather than array.

option 3 is what I am after

Anyone able to advise on a better way of doing this with just an xpath?

Thanks, Mark

2
  • Is it a requirement that the result comes directly from the xpath expression? Because [x.xpath('d/@e') for x in tree.xpath('/a/b')] gives you what you want. Commented Jul 19, 2023 at 2:31
  • thanks for the suggestion.. i am using xpaths to configure an xml parser, and would like to avoid python code in the parser configuration data Commented Jul 19, 2023 at 4:27

1 Answer 1

0

ElementPath supports XPath 3.1 with XPath/XDM arrays so I think you want, in terms of XPath

/a!array { b ! array { d/@e/string() } }

which should give [["3","4"],["5","6"]].

That is the output with SaxonC HE (12.3) of

from saxonche import PySaxonProcessor

xml = '''<a>
<b c="1">
  <d e="3"/>
  <d e="4"/>
</b>
<b c="2">
  <d e="5"/>
  <d e="6"/>
</b>
</a>'''

with PySaxonProcessor(license=False) as saxon:
    xdm_doc = saxon.parse_xml(xml_text=xml)
    xpath_processor = saxon.new_xpath_processor()
    xpath_processor.set_context(xdm_item=xdm_doc)
    xdm_value = xpath_processor.evaluate_single('/a!array { b ! array { d/@e/string() } }')
    print(xdm_value)

At that stage you don't have a Python list of lists, however, rather a PyXdmItem which is an XDM array of arrays, to get a nested Python list I think you can do

    list_of_lists = [inner_array.head.as_list() for inner_array in xdm_array.as_list()]
    print(list_of_lists)

I will need to check whether ElementPath allows that too and perhaps a bit more elegantly; the simplest I have found is

import elementpath, lxml.etree
from elementpath.xpath3 import XPath3Parser

xml = '''<a>
<b c="1">
  <d e="3"/>
  <d e="4"/>
</b>
<b c="2">
  <d e="5"/>
  <d e="6"/>
</b>
</a>'''

tree = lxml.etree.fromstring(str.encode(xml))

array_of_arrays = elementpath.select(tree, '/a!array { b ! array { d/@e/string() } }', parser=XPath3Parser)

print(array_of_arrays)

list_of_lists = [array.items() for array in array_of_arrays[0].items()]

print(list_of_lists)

giving [['3', '4'], ['5', '6']] for the final pint(list_of_lists).

Or using a sequence of arrays in XPath gives you a list of arrays in Python which you can more easily convert into a list of lists in Python:

sequence_of_arrays = elementpath.select(tree, '/a/b ! array { d/@e/string() }', parser=XPath3Parser)

print(sequence_of_arrays)

list_of_lists = [array.items() for array in sequence_of_arrays]

print(list_of_lists)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.