Apply UTF8 encoding when writing Scala Dataframe into CSV file

Question

How can I apply UTF8 encoding properly when writing a dataframe into a CSV file in Spark2-Scala? I am using this:

df.repartition(1).write.mode(SaveMode.Overwrite)
.format("csv").option("header", true).option("delimiter", "|")
.save(Path)

And it is not working: example: replacing é to weird strings.

Thank you.

@Shaido Why am I having weird characters in output then? I checked my DF in Spark-Shell and it is good — Haha
– Haha, Commented Oct 21, 2019 at 8:42
can you post the images of your shell & other for better understanding. — Sarath Chandra Vema
– Sarath Chandra Vema, Commented Oct 21, 2019 at 9:54
Try setting the encoding option explicitly to UTF-8, though that's the default encoding if the option is unset. Perhaps Spark is running with a different locale. — Hristo Iliev
– Hristo Iliev, Commented Oct 21, 2019 at 9:59

Haha · Accepted Answer · 2019-10-24 10:28:19Z

4

So as @Hristo Iliev suggested I needed to force UTF encoding using:

df.repartition(1).write.mode(SaveMode.Overwrite)
.format("csv").option("header", true).option("encoding", "UTF-8").option("delimiter", "|")
.save(Path)

answered Oct 24, 2019 at 10:28

Haha

1,0193 gold badges23 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Apply UTF8 encoding when writing Scala Dataframe into CSV file

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related