1

I am trying to replace all null data in column count_1 with data that may be in column count_2. Below is the expected output with a given input. How can I do this in Spark Scala?

Input Dataframe

name   count_1 count_2
Java   10000   null
Python null    20000
Scala  30000   null
R      null    null
Swift  50000   65000

Output Dataframe

name   merged
Java   10000
Python 20000
Scala  30000
R      null
Swift  50000

1 Answer 1

1

you can do a coalesce on the said columns

This is what I would do (this would work if you have more columns like count_4):

// find columns to do a coalesce
val cols = df.columns.filter(_.startsWith("count")).map(col(_)

// do the actual coalesce
df.select($"name", coalesce(cols: _*).as("merged"))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.