6

There are two dataframes. For simplicity, I put them as follow:

DataFrame1

id | name
-----------
0  | Mike
1  | James

DataFrame2

id | name | salary
-------------------
0  | M    | 10
1  | J    | 20
2  | K    | 30

I want to join the two DataFrame on id and only keep the column name in DataFrame1 while keeping the original one if there is no corresponding id in DataFrame2.

It should be:

id | name  | salary
--------------------
0  | Mike  |  10
1  | James |  20
2  | K     |  30

Till now, I only know how to join two dataframes by:

df1.join(df2, df1("id")===df2("id"), "left").select(df2("id"), df1("name"), df2("salary"))

But it will use null to ignore the name value "K".

Thanks!

0

2 Answers 2

13

You can use coalesce, which returns the first column that isn't null from the given columns. Plus - using left join you should join df1 to df2 and not the other way around:

import org.apache.spark.sql.functions._

df2.join(df1, df1("id")===df2("id"), "left")
  .select(df2("id"), coalesce(df1("name"), df2("name")), df2("salary"))
Sign up to request clarification or add additional context in comments.

Comments

1

to replace null values you can use DataFrameNaFunctions like below...

    df1.join(df2, df1("id")===df2("id"), "left_outer")
.select(df2("id"), df1("name"), df2("salary"))
.na.fill(ImmutableMap.of("name", "unknown")).show()

where 'unknown' is sample value. you can replace with value you wanted...

if you don't want rows with null value columns

val joined = df1.join(df2, df1("id")===df2("id"), "left_outer")
    .select(df2("id"), df1("name"), df2("salary"))

val final = joined.where(joined.col("name").isNotNull)
final.show()

Also Note that as mentioned in @Tzach Zohar answer def coalesce(e: Column*) function

Returns the first column that is not null, or null if all inputs are null.

If you are looking for that kind of ... then you can go ahead.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.