Null values not getting replaced using with column condition in PySpark

Question

I have written a PySpark code to put in the condition that if null values are present in a given Column then it append 'Yes' to another column otherwise 'No'

data = data.withColumn('Coupon_code',when((data.coupon ==""),"No coupon").otherwise("coupon"))

Here condition is on column 'coupon' that if there is null values in 'coupon' column then it it appends "No coupon" in in Coupon_code column otherwise with "coupon"

The problem with above logic is it is not detecting null values and populating every entry with "Coupon"

I tried using is isnull() and subset but to no avail

data = data.withColumn('Coupon_code',when(data.fillna("No coupon", subset["coupon"])).otherwise("coupon"))

for is null() I am getting error that column is not callable. Please suggest where I am going wrong.

Sample data:

I/P:

Coupon
null
null
null
null
H45X
E4R
null

Expected o/p:

Coupon  Coupon_code
null     No Coupon
null     No Coupon
null     No Coupon
null     No Coupon
H45X     coupon
E4R      coupon
null     No Coupon

Even count works but same is not able to replicate in replacing it.

data.where(col("coupon").isNull()).count()

output: 9064

Are you sure that the missing value is an empty string ("") and not, for example, whitespace (" ")? — Liam Galbraith
– Liam Galbraith, Commented Dec 8, 2020 at 10:52
@LiamGalbraith yes those are empty strings. all are null/none datatypes — think-maths
– think-maths, Commented Dec 8, 2020 at 10:59

mck · Accepted Answer · 2020-12-08 12:03:13Z

2

Try if the code below works for you? It should catch both cases where the column coupon is null or it is an empty string.

import pyspark.sql.functions as F

data = data.withColumn('Coupon_code',
    F.when(
        F.col('coupon').isNull() | (F.col('coupon') == F.lit('')),
        "No coupon"
    ).otherwise("coupon")
)

answered Dec 8, 2020 at 12:03

mck

42.7k13 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

think-maths Over a year ago

yes now I can see No-coupons part also populating. Any idea why isnull() not working over df.col.isnull() it gives error columns is not a callable function

mck Over a year ago

it should be isNull, not isnull.

Collectives™ on Stack Overflow

Null values not getting replaced using with column condition in PySpark

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related