2

I am trying to replace the null values with N/A. I have tried with following code but none of them works:

df.withColumn("series_name", when($"series_name") === null,"n/a")
.otherwise($series_name)

and

df.withColumn("series_name", when(col("series_name") === null,"n/a")

what am I missing?

    +--------------------+
    |         series_name|
    +--------------------+
    |Families of the M...|
    |                null|
    |      Ridiculousness|
    |                null|
    |                null|
    +--------------------+

2 Answers 2

4

You could also use the .fillna() method:

df.fillna('N/A', subset=['series_name'])
Sign up to request clarification or add additional context in comments.

Comments

0

I prefer to use coalesce.

from pyspark.sql import functions as f

df.withColumn('series_name', f.expr("coalesce(series_name, 'n/a')"))

4 Comments

What does coalesce do? Do you have a doc reference?
it is same as ifnull function of mysql.
yeah, not a fan of sql :) The strange thing is that it is the same word as the operation that reduce partition number of a dataset. That's a little confusing
it is a spark function and you can use it like col or lit. I know that is the same name with partition collector :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.