Write a csv file into azure blob storage

Question

I am trying use pyspark to analyze my data on databricks notebooks. Blob storage has been mounted on the databricks cluster and after ananlyzing, would like to write csv back into blob storage. As pyspark working in distributed fashion, csv file is broken into small blocks and written on the blob storage. How to overcome this and write as a single csv file on blob when we do analysis using pyspark. Thanks.

Anish K · Accepted Answer · 2019-08-29 12:04:03Z

1

Do you really want a single file? If yes, the only way you can overcome it by merging all the small csv files into a single csv file. You can make use of map function on the databricks cluster to merge it or may be you can use some background job to do the same.

Have a look here: https://forums.databricks.com/questions/14851/how-to-concat-lots-of-1mb-cvs-files-in-pyspark.html

answered Aug 29, 2019 at 12:04

Anish K

8185 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Write a csv file into azure blob storage

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related