0

I am trying to fill the null values from ColY with values from ColX whilst storing the output as a new column in my DataFrame Col_new. I am using pyspark in databricks, however I am fairly new to this.

Sample Data is as follows:

ColX              ColY  
apple             orange
pear              null
grapefruit        pear
apple             null

The desired output would look like the following:

ColX              ColY              Col_new
apple             orange            orange  
pear              null              pear
grapefruit        pear              pear
apple             null              apple

I have tried several lines of code to no avail. My latest attempt was as follows:

.withColumn("Col_new", col('ColX').select(coalesce('ColY')))

Any help would be greatly appreciated. Many Thanks.

2 Answers 2

1

Both columns ColY and ColX should be provided as coalesce's arguments:

df = spark.createDataFrame([
  ("apple", "orange"),
  ("pear", None),
  ("grapefruit", "pear"),
  ("apple", None)
]).toDF("ColX", "ColY")

from pyspark.sql.functions import coalesce

df.withColumn("ColNew", coalesce("ColY", "ColX")).show()
+----------+------+------+
|      ColX|  ColY|ColNew|
+----------+------+------+
|     apple|orange|orange|
|      pear|  null|  pear|
|grapefruit|  pear|  pear|
|     apple|  null| apple|
+----------+------+------+
Sign up to request clarification or add additional context in comments.

Comments

1

coalesce will return the first non-null value from a list of columns. You're only passing in one column, so coalesce has no effect.

The correct syntax in this case would be:

from pyspark.sql.functions import coalesce
df = df.withColumn("Col_new", coalesce('ColY', 'ColX'))

This means take the value of ColY unless it is null, in which case take the value from ColX.

In this case, you can also use when for the equivalent logic:

from pyspark.sql.functions import when

df = df.withColumn(
    "Col_new", 
    when(col("ColY").isNull(), col("ColX")).otherwise(col("ColY"))
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.