How to write Spark data frame to xml file?

Question

Sample :

scala> Frame.show()

|year| make|model|             comment|blank|
|2012|Tesla|    S|          No comment|    R|
|1997| Ford| E350|Go get one now th...|    L|
|2015|Chevy| Volt|                 Try|    M|

to

<item>
    <'year'>2012<'/year'>
    <'make'>Tesla<'/make'>
    <'model'>S<'/mode'>
</item>

10465355 · Accepted Answer · 2019-06-25 15:54:18Z

5

The simplest approach is to use XML writer from spark-xml:

val path: String = ???
df.write.format("com.databricks.spark.xml")
  .option("rootTag", "items")
  .option("rowTag", "item")
  .save(path)

If for some reason it doesn't fit your needs you can dump records individually and saveAsTextFile:

def dumpXML(row: Row): String = ???
df.rdd.map(dumpXML).saveAsTextFile(path)

You can add root element using for example mapPartitions.

edited Jun 25, 2019 at 15:54

10465355

4,6412 gold badges24 silver badges46 bronze badges

answered Feb 25, 2016 at 8:06

zero323

331k108 gold badges982 silver badges958 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Elvish_Blade Over a year ago

I'm getting Null Point exception. 1 method tried : spark-shell --packages com.databricks:spark-xml_2.10:0.3.2

zero323 Over a year ago

It is hard to so what may go wrong without the context. NPE in Spark SQL usually mean some kind of nested access.

Sudarshan kumar Over a year ago

What if i have to add some constant in the xml ?

ghukill Over a year ago

@zero323, would you be willing to elaborate on how mapPartitions could be used to add a root element for a bunch of XML records written via saveAsTextFile?

CodeRunner Over a year ago

what would be the tag used to write a nested xml like this - <ENVELOPE> <header> <nursuryObject> <nursuryEvent>UPDATE</nursuryEvent> <nursuryObjectId>SDSDS</nursuryObjectId> <nursuryObjectOwner>ABCDE</nursuryObjectOwner> <nursuryObjectType>PQRST</nursuryObjectType> </nursuryObject> <messageObject> <messageId>123</messageId> <timeStamp>2011-02-28T09:27:02TIMESTAMP</timeStamp> </messageObject>

Collectives™ on Stack Overflow

How to write Spark data frame to xml file?

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related