2

I'm trying to get return all UK artists in alphabetical order--I'm not fully understanding XML parsing. What I've gathered from debugging and the documentation is that when you use the findall() method it returns a list that cannot be further navigated, is this correct? So how do I iterate over the subelements of a parent node, in this case <cd>, to find all of the elements whose country=='UK'? Thanks in advance!

def get_uk_artists(xmlstr):
      xml = ET.fromstring(xmlstr)
      artist_list = []
      for each in xml.findall('cd'):
          if each.findall('./cd/country').text == 'UK':
              artist_list.append(each.findall('artist').text)
      return artist_list.sort()

The XML is:

 xml_doc ='''<?xml version="1.0" encoding="ISO-8859-1"?>
    <catalog>
        <cd>
            <title>Empire Burlesque</title>
            <artist sex="male">Bob Dylan</artist>
            <country>USA</country>
            <company>Columbia</company>
            <price>10.90</price>
            <year>1985</year>
        </cd>
        <cd>
            <title>Hide your heart</title>
            <artist sex="female">Bonnie Tyler</artist>
            <country>UK</country>
            <company>CBS Records</company>
            <price>9.90</price>
            <year>1988</year>
        </cd>
        <cd>
            <title>Greatest Hits</title>
            <artist sex="female">Dolly Parton</artist>
            <country>USA</country>
            <company>RCA</company>
            <price>9.90</price>
            <year>1982</year>
        </cd>
        <cd>
            <title>Still got the blues</title>
            <artist sex="male">Gary Moore</artist>
            <country>UK</country>
            <company>Virgin records</company>
            <price>10.20</price>
            <year>1990</year>
        </cd>
    </catalog>
    '''

1 Answer 1

2

You could use:

import xml.etree.ElementTree as ET

xml = ET.fromstring(xml_doc)
artists = []
for cd in xml.findall('cd'):
    if cd.find('country').text == 'UK':
        artists.append(cd.find('artist').text)

artists.sort()
print(artists)

Output

['Bonnie Tyler', 'Gary Moore']

This loops over each cd in the document. If the cd has a country child element with text equal to 'UK', the cd artist name gets appended to the list of artists. artists.sort() then sorts the list in-place.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @lambo477 --it did the trick! I have a couple questions: 1. I tried artists.sort(), but it returned an empty list. However sorted(artists) worked, any ideas why this is? 2. When we call the findall() method, what structure does it return?
artists.sort() will return None, but it sorts the list 'in-place'. Run print(artists) and you will see the sorted list. To use sorted() returns a list, so you would need to use artists = sorted(artists).
xml.findall() returns a list of elements

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.