11

How can I export Spark's DataFrame to csv file using Scala?

0

4 Answers 4

16

Easiest and best way to do this is to use spark-csv library. You can check the documentation in the provided link and here is the scala example of how to load and save data from/to DataFrame.

Code (Spark 1.4+):

dataFrame.write.format("com.databricks.spark.csv").save("myFile.csv")

Edit:

Spark creates part-files while saving the csv data, if you want to merge the part-files into a single csv, refer the following:

Merge Spark's CSV output folder to Single File

Sign up to request clarification or add additional context in comments.

Comments

15

In Spark verions 2+ you can simply use the following;

df.write.csv("/your/location/data.csv")

If you want to make sure that the files are no longer partitioned then add a .coalesce(1) as follows;

df.coalesce(1).write.csv("/your/location/data.csv")

3 Comments

Can we rename the part_0000 file?
You can easily rename it after it's written out if you wish by using cp <old filepath> <new filepath> (or hdfs dfs -cp <old filepath> <new filepath> if the file is still in hdfs) to copy the file to its current location but with the new name
Please note this doesn't export with headers
13

Above solution exports csv as multiple partitions. I found another solution by zero323 on this stackoverflow page that exports a dataframe into one single CSV file when you use coalesce.

df.coalesce(1)
  .write.format("com.databricks.spark.csv")
  .option("header", "true")
  .save("/your/location/mydata")

This would create a directory named mydata where you'll find a csv file that contains the results.

Comments

0

A method to export and rename the file:

def export_csv(  
  fileName: String,
  filePath: String
  ) = {

  val filePathDestTemp = filePath + ".dir/"
  val merstageout_df = spark.sql(merstageout)

  merstageout_df
    .coalesce(1)
    .write
    .option("header", "true")
    .mode("overwrite")
    .csv(filePathDestTemp)
  
  val listFiles = dbutils.fs.ls(filePathDestTemp)

  for(subFiles <- listFiles){
      val subFiles_name: String = subFiles.name
      if (subFiles_name.slice(subFiles_name.length() - 4,subFiles_name.length()) == ".csv") {
        dbutils.fs.cp (filePathDestTemp + subFiles_name,  filePath + fileName+ ".csv")
        dbutils.fs.rm(filePathDestTemp, recurse=true)
      }}} 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.