0

Given XML as string :

xml_as_string = """<root xmlns:SOAP-ENV="http://w/e">
    <Context operation="something.wsdl">
        <SOAP_Version>123.321</SOAP_Version>
        <Namespace xmlns:SOAP-ENV="http://dummy.com"/>
    </Context>
    <Header/>
    <Body>
        <ns2:Complex xmlns:ns2="http://www.test.this/idk">
            <ns2:simple>
                <ns2:child1>IKM</ns2:child1>
                <ns2:child2>1234</ns2:child2>
                <ns2:child3>S</ns2:child3>
            </ns2:simple>
            <ns2:simple>
                <ns2:child1>QEw</ns2:child1>
                <ns2:child2>10028111</ns2:child2>
                <ns2:parentchild1>IKM</ns2:parentchild1>
                <ns2:parentchild2>1234</ns2:parentchild2>
                <ns2:child3>S</ns2:child3>
            </ns2:simple>
            <ns2:simple>
                <ns2:child1>IOW</ns2:child1>
                <ns2:child2>100043896</ns2:child2>
                <ns2:parentchild1>QEw</ns2:parentchild1>
                <ns2:parentchild2>10028111</ns2:parentchild2>
            </ns2:simple>
            <ns2:extra>
                <ns2:childextra>0</ns2:childextra>
            </ns2:extra>
        </ns2:Complex>
    </Body>
</root>
"""

Creating xml tree using xml.etree.ElementTree:

`import xml.etree.ElementTree as ET`

root = ET.fromstring(xml_as_string)

For a specific path, I'm trying to print all the values found

path = './Body/Complex/simple/child1'

path_vals = root.findall(path)
print([e.text for e in path_vals])

The result is an empty list :

[]

Is there any way to accomplish this in python ?

1
  • u need to include the namespace Commented Jun 4, 2020 at 11:11

1 Answer 1

1

You probably want all the text associated with child1 : You need to use the namespace to get the data : "http://www.test.this/idk" is the namespace

namespace = '{http://www.test.this/idk}'

[ent.text for ent in root.findall(F".//{namespace}child1")]

['IKM', 'QEw', 'IOW']

If you want to do away with the namespaces, you could give parsel a go :

from parsel import Selector
selector = Selector(xml_as_string, 'xml')

#remove namespace
selector.remove_namespaces()

#get your required text
selector.xpath(".//child1/text()").getall()

['IKM', 'QEw', 'IOW']
Sign up to request clarification or add additional context in comments.

4 Comments

u could pass it as a variable, then use fstrings.
Hmm, then idealy i'd prefer something that either ignores namespaces or figures them out by itself, hmm, looking into lxml rn, maybe, just maybe
edited code, using parsel
With ElementTree in Python 3.8, you can use a wildcard for the namespace. Example: stackoverflow.com/a/62117710/407651.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.