Rename File When storing Spark DataFrame as .csv [duplicate]

Question

I am currently working on storing a spark DataFrame as a .csv file in blob storage on Azure. I am using the following code.

 smtRef2_DF.dropDuplicates().coalesce(1).write
  .mode("overwrite")
  .format("com.databricks.spark.csv")
  .option("header", "true")
  .save(csvBlobStorageMount + "/Output/Smt/SmtRef.csv")

This works but it creates a SmtRef.csv folder where the actual .csv file is stored as part-00000-tid.csv. How do I specify the name of the actual .csv file?

Thanks in Advance

I don't think this question should be closed - saving as a single file is not like renaming a file. here is an option for renaming with PYARROW & pathlib def rename_file_hdfs(hdfs_path): phc = pyarrow.hdfs.connect() fl = phc.ls(hdfs_path) fl = [f for f in fl if pathlib.Path(f).stem.startswith("part)] for i, f in enumerate(fl): pa = Path(fl[0]).parent nf = f"newf{i}.csv" tp = Path(pa, nf) tp = str(tp).replace("hdfs:/", "hdfs://") phc.mv(f"{f}", f"{tp}") — skibee
– skibee, Commented Apr 5, 2020 at 7:49

Keshinko · Accepted Answer · 2018-08-29 15:21:06Z

2

If the file is small enough to fit into memory, one work around is to convert to a pandas dataframe and save as csv from there.

df_pd = df.toPandas()
df_pd.to_csv("path")

answered Aug 29, 2018 at 15:21

Keshinko

3382 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Chandan Ray · Accepted Answer · 2018-08-29 15:17:50Z

1

It’s not possible with spark api.

If you want to achieve this please use .repartition(1) which will generate one PART file and then Use Hadoop file system api to rename the file in HDFS

import org.apache.hadoop.fs._ FileSystem.get(spark.sparkContext.hadoopConfiguration()).rename(new Path(“oldpathtillpartfile”), new path(“newpath”))

answered Aug 29, 2018 at 15:17

Chandan Ray

2,0911 gold badge13 silver badges16 bronze badges

Collectives™ on Stack Overflow

Rename File When storing Spark DataFrame as .csv [duplicate]

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related