Replace null values with N/A in a spark dataframe

Question

I am trying to replace the null values with N/A. I have tried with following code but none of them works:

df.withColumn("series_name", when($"series_name") === null,"n/a")
.otherwise($series_name)

and

df.withColumn("series_name", when(col("series_name") === null,"n/a")

what am I missing?

    +--------------------+
    |         series_name|
    +--------------------+
    |Families of the M...|
    |                null|
    |      Ridiculousness|
    |                null|
    |                null|
    +--------------------+

Til Piffl · Accepted Answer · 2021-12-01 14:11:00Z

4

You could also use the .fillna() method:

df.fillna('N/A', subset=['series_name'])

answered Dec 1, 2021 at 14:11

Til Piffl

6455 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Daeho Ro · Accepted Answer · 2021-12-01 14:04:00Z

0

I prefer to use coalesce.

from pyspark.sql import functions as f

df.withColumn('series_name', f.expr("coalesce(series_name, 'n/a')"))

answered Dec 1, 2021 at 14:04

Daeho Ro

13.7k4 gold badges25 silver badges50 bronze badges

4 Comments

Juh_ Over a year ago

What does coalesce do? Do you have a doc reference?

Daeho Ro Over a year ago

it is same as ifnull function of mysql.

Juh_ Over a year ago

yeah, not a fan of sql :) The strange thing is that it is the same word as the operation that reduce partition number of a dataset. That's a little confusing

Daeho Ro Over a year ago

it is a spark function and you can use it like col or lit. I know that is the same name with partition collector :)

Collectives™ on Stack Overflow

Replace null values with N/A in a spark dataframe

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related