0

I'm making an API call that returns multiple xml responses as so-

<?xml version="1.0" encoding="UTF-8"?>
<BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="BESAPI.xsd">
        <Action Resource="https://www.example.com">
                <Name> ABC </Name>
                <ID> 123 </ID>
        </Action>
</BESAPI>

<?xml version="1.0" encoding="UTF-8"?>
<BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="BESAPI.xsd">
        <Action Resource="https://www.example.com">
                <Name> DEF </Name>
                <ID> 456 </ID>
        </Action>
</BESAPI>

<?xml version="1.0" encoding="UTF-8"?>
<BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="BESAPI.xsd">
        <Action Resource="https://www.example.com">
                <Name> GHI </Name>
                <ID> 789 </ID>
        </Action>
</BESAPI>

I want to parse all the action IDs from the tag and add them to a list-

import xml.etree.ElementTree as ET
url = ""
payload = ""
headers = {}
response = requests.post(url, headers=headers, data=payload)

root = ET.fromstring(response.content)
actionidlist = []
for elem in root.iter('Action'):
    for subelem in elem.iter('ID'):
        actionidlist.append(subelem.text)
        print(actionidlist)

I get errors though because there are multiple roots. How do I parse this?

Edit: By errors I mean, actionidlist seems to only contain the last ID and not the rest of the IDs.

4
  • Can you show the import and parse in your code ? We don't know if you're using the std xml module, or lxml, for example. Also, you say "I get errors" but you don't show them, is it in the parsing phase ? or when calling root.iter() ?. Please include the full stacktrace Commented Feb 12, 2021 at 7:32
  • 1
    Wrap the response in a single root element in order to make it well-formed XML. Commented Feb 12, 2021 at 7:55
  • @joao I've edited the question. Commented Feb 12, 2021 at 19:50
  • I would carefully read API instructions. Are you sending multiple params? Hard to believe an API will return a non well-informed XML response. Is it embedded in larger XML? Get in touch with maintainers. Commented Feb 13, 2021 at 0:25

2 Answers 2

1

ET.fromstring() only parses one XML section, if you try to parse your entire input data, with multiple roots, you get the error:

xml.etree.ElementTree.ParseError: junk after document element: line 9, column 0

So I suggest pre-processing the input data, to split it into a list of xml responses, then parse each one in turn:

import xml.etree.ElementTree as ET
url = ""
payload = ""
headers = {}
response = requests.post(url, headers=headers, data=payload)

# Split the input data into a list of strings (xml sections)
xml_sections = ['']
for line in response.content.splitlines():
    if len(line) != 0:
        xml_sections[-1] += line + '\n'
    else:
        xml_sections.append('')

# Parse each XML section separately
actionidlist = []
for s in xml_sections:
    root = ET.fromstring(s)
    for elem in root.iter('Action'):
        for subelem in elem.iter('ID'):
            actionidlist.append(subelem.text)
print(actionidlist)

This produces the following output:

[' 123 ', ' 456 ', ' 789 ']
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect! Splitting the xml responses worked! Thank you!
0
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
file = "filepath/<xml_file_name.xml>"
schema_path = "filepath/<xml_schame_name.xml>"
"""
"""
XSD Schema
schema_path =
<?xml version="1.0" encoding="UTF-8"?>
<BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:noNamespaceSchemaLocation="BESAPI.xsd">
        <Action Resource="https://www.example.com">
                <Name> string </Name>
                <ID> INT </ID>
        </Action>
</BESAPI>

<?xml version="1.0" encoding="UTF-8"?>
<BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:noNamespaceSchemaLocation="BESAPI.xsd">
        <Action Resource="https://www.example.com">
                <Name> string </Name>
                <ID> INT </ID>
        </Action>
</BESAPI>
"""


df_schema = sqlContext.read.format('com.databricks.spark.xml').options(rowTag='Resource').load(schema_path)
df =sqlContext.read.format('com.databricks.spark.xml').options(rowTag='Resource').load(path,schema=df_schema.schema)
#display(df)
df.createOrReplaceTempView("temptable")
structured_df =sqlContext.sql("select concat_ws(', ',Action.Name) as Name,concat_ws(', ',Action.ID) as ID from temptable")
display(structured_df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.