2

I have two tables with columns table1 has id,name and table2 has only id

table 1
--------------
id     name
--------------
1   sudheer
2   sandeep
3   suresh
----------------

table2

--------
id
-------- 
1
2
-------

required table should be if "id" column doesn't exist in the table2 my new column value should be "N" otherwise "Y"

table3

id  name        IND
1   sudheer     Y
2   sandeep     Y
3   suresh      N

I have tried the below steps to approach:

val df = hc.sql("select * from table1")
val df1 = hc.sql("select * from table2")

I tried to have a one more column (phone) in table2,as my join dataframe doesn't consist of id from table2,based on that null value I tried to set the value to Y/N

val df2 = df.join(df1,Seq("id"),"left_outer").withColumn("IND",exp(when(df1("phone")!= "null","Y").otherwise("N")))

But this didn't worked out with error found : Boolean required: org.apache.spark.sql.Column

Can anyone suggest any idea how to get the required result without adding a column to my table2?

5
  • 1
    Try withColumn("IND", when(col("phone").isNotNull, "Y").otherwise("N")). Commented Apr 12, 2018 at 9:07
  • I have added a phone column to my table2 and tried to achieve this...your logic worked out. Commented Apr 12, 2018 at 9:11
  • But is their anyway without adding a extra column to achieve this Commented Apr 12, 2018 at 9:11
  • The solution provided by Shankar Koirala will probably look a bit cleaner, I would suggest you to use that instead :) Commented Apr 12, 2018 at 9:13
  • 1
    Thanks Shaido for the quick solution Commented Apr 12, 2018 at 9:26

1 Answer 1

1

This you can add one new column in table2 with default value "Y" and join and replace the null values with "N"

val df1 = Seq(
    (1, "sudheer"),
    (2, "sandeep"),
    (3, "suresh")
).toDF("id", "name")

val df2 = Seq(1, 2).toDF("id")
         .withColumn("IND", lit("Y"))

val df3 = df1.join(df2, Seq("id"), "left_outer")
             .na.fill("N")

Or you can use when as you did

val df3 = df1.join(df2, Seq("id"), "left_outer")
         .withColumn("IND", when($"IND".isNull, "N").otherwise("Y"))

Hope this helps!

Sign up to request clarification or add additional context in comments.

4 Comments

what does this command do? val df3 = df1.join(df2,Seq("id"),"left_outer") .na.fill("N")
It replaces the null values you provide in fill() method.
Got it..Thanks Shankar.
Great, glad that helped you :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.