3

I have a data frame with n number of columns and I want to replace empty strings in all these columns with nulls.

I tried using

val ReadDf = rawDF.na.replace("columnA", Map( "" -> null));

and

val ReadDf = rawDF.withColumn("columnA", if($"columnA"=="") lit(null) else $"columnA" );

Both of them didn't work.

Any leads would be highly appreciated. Thanks.

7
  • 1
    There is many errors here and it's not just related to replacing an empty string with null but also the jdbc error and as a matter of fact, that second error can't be resolved with the current description and information because it cannot be reproduced. Please review your question as it is subject of being closed ! Commented May 22, 2017 at 7:40
  • One thing I don't understand is, How is [this][1] relevant to the question I asked. It talks about an array of strings in MYSQL and my question was about replacing empty strings with nulls in the data frame. Can someone enlighten me? Commented May 23, 2017 at 11:57
  • Here it is. I haven't voted for this question to be closed as a duplicated but as I said in the comment earlier, you have 2 differents errors and the latest error message is related to the question mentioned by the voters. Commented May 23, 2017 at 12:01
  • 1
    @eliasah I was not referring to you. I am very sorry that I didn't address the persons. Let me see if I can edit the comment and add them. Commented May 23, 2017 at 12:28
  • 1
    I'm not sure why this was closed as a duplicate of an SQL question, but this has been reopened. Commented Sep 12, 2018 at 23:50

1 Answer 1

6

Your first approach seams to fail due to a bug that prevents replace from being able to replace values with nulls, see here.

Your second approach fails because you're confusing driver-side Scala code for executor-side Dataframe instructions: your if-else expression would be evaluated once on the driver (and not per record); You'd want to replace it with a call to when function; Moreover, to compare a column's value you need to use the === operator, and not Scala's == which just compares the driver-side Column object:

import org.apache.spark.sql.functions._

rawDF.withColumn("columnA", when($"columnA" === "", lit(null)).otherwise($"columnA"))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.