4

Sample :

scala> Frame.show()

|year| make|model|             comment|blank|
|2012|Tesla|    S|          No comment|    R|
|1997| Ford| E350|Go get one now th...|    L|
|2015|Chevy| Volt|                 Try|    M|

to

<item>
    <'year'>2012<'/year'>
    <'make'>Tesla<'/make'>
    <'model'>S<'/mode'>
</item>

1 Answer 1

5

The simplest approach is to use XML writer from spark-xml:

val path: String = ???
df.write.format("com.databricks.spark.xml")
  .option("rootTag", "items")
  .option("rowTag", "item")
  .save(path)

If for some reason it doesn't fit your needs you can dump records individually and saveAsTextFile:

def dumpXML(row: Row): String = ???
df.rdd.map(dumpXML).saveAsTextFile(path)

You can add root element using for example mapPartitions.

Sign up to request clarification or add additional context in comments.

5 Comments

I'm getting Null Point exception. 1 method tried : spark-shell --packages com.databricks:spark-xml_2.10:0.3.2
It is hard to so what may go wrong without the context. NPE in Spark SQL usually mean some kind of nested access.
What if i have to add some constant in the xml ?
@zero323, would you be willing to elaborate on how mapPartitions could be used to add a root element for a bunch of XML records written via saveAsTextFile?
what would be the tag used to write a nested xml like this - <ENVELOPE> <header> <nursuryObject> <nursuryEvent>UPDATE</nursuryEvent> <nursuryObjectId>SDSDS</nursuryObjectId> <nursuryObjectOwner>ABCDE</nursuryObjectOwner> <nursuryObjectType>PQRST</nursuryObjectType> </nursuryObject> <messageObject> <messageId>123</messageId> <timeStamp>2011-02-28T09:27:02TIMESTAMP</timeStamp> </messageObject>

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.