0

this is a part of my XML file with all the necessary depth:

<?xml version="1.0" encoding="UTF-8" ?>
<Taxonomy>
    <TaxonomyNode>
        <Entity>BUSINESS</Entity>
        <Description>Business News</Description>
        <TaxonomyNode>
            <Entity>COS</Entity>
            <Description>Company News</Description>
            <TaxonomyNode>
                <Entity>ANA</Entity>
                <Description>Analyst Ratings &amp; Commentary</Description>
                <TaxonomyNode>
                    <Entity>ANABUY</Entity>
                    <Description>Analyst Ratings - Buys</Description>
                    <TaxonomyNode>
                        <Entity>ANABEVT</Entity>
                        <Description>Analyst Ratings Events, Announcements - Buys</Description>
                    </TaxonomyNode>
                    <TaxonomyNode>
                        <Entity>BMRANABUY</Entity>
                        <Description>Analyst Ratings - Buys</Description>
                        <TaxonomyNode>
                            <Entity>ANRACC</Entity>
                            <Description>ANR Accumulate</Description>
                        </TaxonomyNode>
                    </TaxonomyNode>
                </TaxonomyNode>
           </TaxonomyNode>
       </TaxonomyNode>
   </TaxonomyNode> 
</Taxonomy>

as you can see we have multiple rows with the same name, and reading this with spark with the conventional spark.read.format("com.databricks.spark.xml").option("rowTag","TaxonomyNode").load(completeXMLFilePath) is not working, it is returning me a dataframe looking like this: enter image description here

and that has a schema like this: enter image description here

I would be thankful if anybody has an idea on how to make this thing work

3
  • Your XML is nested, not flat. It's being read correctly. Do you want to flatten it? If so, why? Commented Jun 2, 2020 at 14:43
  • yes that's true, i want to flatten it well to get my data properly Commented Jun 3, 2020 at 7:00
  • 1
    Take a look at this answer: stackoverflow.com/a/49672982/6030951 Commented Jun 4, 2020 at 14:53

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.