Is there a way to set spark csv number format?

Question

If I'm using myDF.write.csv("wherever"), how can I set the numeric format for stored data? EG, if I do:

val t = spark.sql("SELECT cast(1000000000000 as double) as aNum")
t.write.csv("WXYZ")

and then review WXYZ, I will find I have 1.0E12. How could I change this for all doubles such that I get 1000000000000.00?

Steven Black · Accepted Answer · 2018-04-13 20:33:45Z

1

The way I've handled this issue is by casting the number to a string

val t = spark.sql("SELECT cast(1000000000000 as string) as aNum")
t.write.csv("WXYZ")
t.show()

And the output is

+-------------+
|         aNum|
+-------------+
|1000000000000|
+-------------+

:) I hope this helps!

answered Apr 13, 2018 at 20:33

Steven Black

2,2521 gold badge19 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ah, I was hoping to do it for a class of items, rather than for each item.

what do you mean a "class of terms"? like a column?

skywalkerytx · Accepted Answer · 2018-04-13 21:00:01Z

0

if data comes from hive there is a hive udf printf u can use:

select printf('%.2f', col) from foobar

planB:

dataset.map( col => s"$col%.2f")

take care of planB, there may be extra cost based on your data source

btw, sometimes it is likely just a problem of display in excel, just check the csv with text editor

answered Apr 13, 2018 at 21:00

skywalkerytx

2533 silver badges15 bronze badges