1

I have written a PySpark code to put in the condition that if null values are present in a given Column then it append 'Yes' to another column otherwise 'No'

data = data.withColumn('Coupon_code',when((data.coupon ==""),"No coupon").otherwise("coupon"))

Here condition is on column 'coupon' that if there is null values in 'coupon' column then it it appends "No coupon" in in Coupon_code column otherwise with "coupon"

The problem with above logic is it is not detecting null values and populating every entry with "Coupon"

I tried using is isnull() and subset but to no avail

data = data.withColumn('Coupon_code',when(data.fillna("No coupon", subset["coupon"])).otherwise("coupon"))

for is null() I am getting error that column is not callable. Please suggest where I am going wrong.

Sample data:

I/P:

Coupon
null
null
null
null
H45X
E4R
null

Expected o/p:

Coupon  Coupon_code
null     No Coupon
null     No Coupon
null     No Coupon
null     No Coupon
H45X     coupon
E4R      coupon
null     No Coupon

Even count works but same is not able to replicate in replacing it.

data.where(col("coupon").isNull()).count()

output: 9064
4
  • please post sample data Commented Dec 8, 2020 at 10:49
  • Are you sure that the missing value is an empty string ("") and not, for example, whitespace (" ")? Commented Dec 8, 2020 at 10:52
  • @LiamGalbraith yes those are empty strings. all are null/none datatypes Commented Dec 8, 2020 at 10:59
  • dtype for 'coupon' column is string Commented Dec 8, 2020 at 11:14

1 Answer 1

2

Try if the code below works for you? It should catch both cases where the column coupon is null or it is an empty string.

import pyspark.sql.functions as F

data = data.withColumn('Coupon_code',
    F.when(
        F.col('coupon').isNull() | (F.col('coupon') == F.lit('')),
        "No coupon"
    ).otherwise("coupon")
)
Sign up to request clarification or add additional context in comments.

2 Comments

yes now I can see No-coupons part also populating. Any idea why isnull() not working over df.col.isnull() it gives error columns is not a callable function
it should be isNull, not isnull.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.