I have written a PySpark code to put in the condition that if null values are present in a given Column then it append 'Yes' to another column otherwise 'No'
data = data.withColumn('Coupon_code',when((data.coupon ==""),"No coupon").otherwise("coupon"))
Here condition is on column 'coupon' that if there is null values in 'coupon' column then it it appends "No coupon" in in Coupon_code column otherwise with "coupon"
The problem with above logic is it is not detecting null values and populating every entry with "Coupon"
I tried using is isnull() and subset but to no avail
data = data.withColumn('Coupon_code',when(data.fillna("No coupon", subset["coupon"])).otherwise("coupon"))
for is null() I am getting error that column is not callable. Please suggest where I am going wrong.
Sample data:
I/P:
Coupon
null
null
null
null
H45X
E4R
null
Expected o/p:
Coupon Coupon_code
null No Coupon
null No Coupon
null No Coupon
null No Coupon
H45X coupon
E4R coupon
null No Coupon
Even count works but same is not able to replicate in replacing it.
data.where(col("coupon").isNull()).count()
output: 9064