11

I am using Spark SQL for reading parquet and writing parquet file.

But some cases,i need to write the DataFrame as text file instead of Json or Parquet.

Is there any default methods supported or i have to convert that DataFrame to RDD then use saveAsTextFile() method?

2 Answers 2

16

Using Databricks Spark-CSV you can save directly to a CSV file and load from a CSV file afterwards like this

import org.apache.spark.sql.SQLContext

SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read()
    .format("com.databricks.spark.csv")
    .option("inferSchema", "true")
    .option("header", "true")
    .load("cars.csv");

df.select("year", "model").write()
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
    .save("newcars.csv");
Sign up to request clarification or add additional context in comments.

2 Comments

should it be df.select("year", "model").write.format instead of df.select("year", "model").write().format ? Else you get a TypeError: 'DataFrameWriter' object is not callable error
This is the official example provided for Spark 1.3. If you use Spark 1.4+ you should use df.select("year", "model").write.format as you suggested.
3
df.repartition(1).write.option("header", "true").csv("filename.csv")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.