0

I have this xml:

<?xml version="1.0" encoding="utf-8" ?> 
<ArrayOfEMObject2 xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.blue-order.com/ma/essencemanagerws/EssenceManager">
    <EMObject2>
        <emguid>727ef486-31b3-48c3-b38e-39561995ef80</emguid> 
        <orgname>2435e6b6-e19a-4ca5-a708-47f7d9387bb9.wav</orgname> 
        <streamclass>AUDIO</streamclass> 
        <streamtype>WAV</streamtype> 
        <prefusage>BROWSE</prefusage> 
    </EMObject2>
    <EMObject2>
        <emguid>e866abef-7571-45a7-84be-85f2ffc35b31</emguid> 
        <orgname>201701191006474010024190133005056B91BF30000003352B00000D0F094671.mp3</orgname> 
        <streamclass>AUDIO</streamclass> 
        <streamtype>MP3</streamtype> 
        <prefusage>AUX</prefusage> 
    </EMObject2>
    <EMObject2>
        <emguid>f02ab3db-93c8-4cbf-82b8-5fb06704a4ea</emguid> 
        <orgname>201701191006474010024190133005056B91BF30000003352B00000D0F094671.mp3</orgname> 
        <streamclass>AUDIO</streamclass> 
        <streamtype>MP3</streamtype> 
        <prefusage>AUX</prefusage> 
    </EMObject2>

If the streamtypeis MP3, I need the corresponding emguid and orgname.

I already have this:

from xml.etree import ElementTree
# ...
namespace = '{http://www.blue-order.com/ma/essencemanagerws/EssenceManager}'
for child in root.findall('.//{}streamtype'.format(namespace)):
    if child.text == 'MP3':

How should I proceed here?

2
  • Which XML parser are you using? Commented Dec 17, 2020 at 9:04
  • I'm using ElementTree Commented Dec 17, 2020 at 9:10

3 Answers 3

1

You can find and check the streamtype tag and then retrieve the other information like this:

from xml.etree import ElementTree
# ...
namespace = '{http://www.blue-order.com/ma/essencemanagerws/EssenceManager}'
for child in root.findall('.//{}EMObject2'.format(namespace)):
    if child.find('{}streamtype'.format(namespace)).text == 'MP3':
        print(child.find('{}emguid'.format(namespace)).text)
        print(child.find('{}orgname'.format(namespace)).text)
Sign up to request clarification or add additional context in comments.

Comments

1

Try this.

from simplified_scrapy import SimplifiedDoc,utils

xml = '''
<?xml version="1.0" encoding="utf-8" ?> 
<ArrayOfEMObject2 xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.blue-order.com/ma/essencemanagerws/EssenceManager">
    <EMObject2>
        <emguid>727ef486-31b3-48c3-b38e-39561995ef80</emguid> 
        <orgname>2435e6b6-e19a-4ca5-a708-47f7d9387bb9.wav</orgname> 
        <streamclass>AUDIO</streamclass> 
        <streamtype>WAV</streamtype> 
        <prefusage>BROWSE</prefusage> 
    </EMObject2>
    <EMObject2>
        <emguid>e866abef-7571-45a7-84be-85f2ffc35b31</emguid> 
        <orgname>201701191006474010024190133005056B91BF30000003352B00000D0F094671.mp3</orgname> 
        <streamclass>AUDIO</streamclass> 
        <streamtype>MP3</streamtype> 
        <prefusage>AUX</prefusage> 
    </EMObject2>
    <EMObject2>
        <emguid>f02ab3db-93c8-4cbf-82b8-5fb06704a4ea</emguid> 
        <orgname>201701191006474010024190133005056B91BF30000003352B00000D0F094671.mp3</orgname> 
        <streamclass>AUDIO</streamclass> 
        <streamtype>MP3</streamtype> 
        <prefusage>AUX</prefusage> 
    </EMObject2>
'''
doc = SimplifiedDoc(xml)
lst = doc.selects('streamtype').contains('MP3').parent
print ([(l.emguid.text,l.orgname.text) for l in lst])

# Or
lst = doc.selects('EMObject2')
for l in lst:
    if l.streamtype.text=='MP3':
        print (l.emguid.text,l.orgname.text)

Result:

[('e866abef-7571-45a7-84be-85f2ffc35b31', '201701191006474010024190133005056B91BF30000003352B00000D0F094671.mp3'), ('f02ab3db-93c8-4cbf-82b8-5fb06704a4ea', '201701191006474010024190133005056B91BF30000003352B00000D0F094671.mp3')]
e866abef-7571-45a7-84be-85f2ffc35b31 201701191006474010024190133005056B91BF30000003352B00000D0F094671.mp3
f02ab3db-93c8-4cbf-82b8-5fb06704a4ea 201701191006474010024190133005056B91BF30000003352B00000D0F094671.mp3

Comments

0

Here's an attempt which instead seeks the EMObject2 instances and checks their children.

namespace = '{http://www.blue-order.com/ma/essencemanagerws/EssenceManager}'
tags = {'{}{}'.format(namespace, tag): tag
        for tag in ('orgname', 'streamtype', 'emguid')}
for node in root.findall('.//{}EMObject2'.format(namespace)):
    match = dict()
    for child in node:
        if child.tag in tags:
            match[tags[child.tag]] = child.text
    try:
        if match['streamtype'] == 'MP3':
            print(match['orgname'], match['emguid'])
    except KeyError:
        pass

(I had to repair your XML by adding a closing tag to get this to run.)

1 Comment

There is probably a more idiomatic way to solve this but at least it gets the job done.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.