Write a CSV file in quoteMode NON_NUMERIC, to have only strings and non numeric cells surrounded by quotes

Question

I have a CSV to write that has that schema :

StructType s = schema.add("codeCommuneCR", StringType, false);
s = s.add("nomCommuneCR", StringType, false);
s = s.add("populationCR", IntegerType, false);
s = s.add("resultatComptable", IntegerType, false);

If I don't provide an option "quoteMode" or even if I set it to NON_NUMERIC, this way :

ds.coalesce(1).write().mode(SaveMode.Overwrite)
.option("header", "true")
.option("quoteMode", "NON_NUMERIC")
.option("quote", "\"")
.csv("./target/out_200071470.csv");

the CSV written by Spark is this one :

codeCommuneCR,nomCommuneCR,populationCR,resultatComptable
03142,LENAX,267,43

If I set an option "quoteAll" instead, like that :

ds.coalesce(1).write().mode(SaveMode.Overwrite)
.option("header", "true")
.option("quoteAll", true)
.option("quote", "\"")
.csv("./target/out_200071470.csv");

it generates :

codeCommuneCR,nomCommuneCR,populationCR,resultatComptable
"03142","LENAX","267","43"

But I would like .option("quoteMode", "NON_NUMERIC") to generate :

codeCommuneCR,nomCommuneCR,populationCR,resultatComptable
"03142","LENAX",267,43

according to my schema.

How should my settings be done ?

Regards,

Could be a bug. Consider filing a report at issues.apache.org — Sina Madani
– Sina Madani, Commented Feb 17, 2019 at 7:14

Marc Le Bihan · Accepted Answer · 2019-02-26 04:37:54Z

1

I've opened an issue about it, and learnt that Spark handles now the CSV through Univocity, who do not support anymore this feature.

Re-adding it is not planned, the "quoteMode" option is no more taken into account.

answered Feb 26, 2019 at 4:37

Marc Le Bihan

3,5916 gold badges35 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

tariq abughofa Over a year ago

How come it's unplanned? I can't find any other way to differentiate between empty strings and nulls in CSV with Spark.

Gabor Szarnyas Over a year ago

The emptyValue and the nullValue options of the DataFrameWriter should help differentiate between empty strings and nulls.

Collectives™ on Stack Overflow

Write a CSV file in quoteMode NON_NUMERIC, to have only strings and non numeric cells surrounded by quotes

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related