0

I need to filter a dataframe with the below criteria.

I have 2 columns 4Wheel(Subaru, Toyota, GM, null/empty) and 2Wheel(Yamaha, Harley, Indian, null/empty).

I have to filter on 4Wheel with values (Subaru, Toyota), if 4Wheel contain empty/null then filter on 2Wheel with values (Yamaha, Harley)

I couldn't find this type of filtering in different examples. I am new to spark/scala, so could not get enough idea to implement this.

Thanks, Barun.

1
  • 1
    can you add input data and expected value too with null values? Commented Aug 19, 2021 at 14:02

1 Answer 1

2

You can use spark SQL built-in function when to check if a column is null or empty, and filter accordingly:

import org.apache.spark.sql.functions.{col, when}

dataframe.filter(when(col("4Wheel").isNull || col("4Wheel").equalTo(""), 
                   col("2Wheel").isin("Yamaha", "Harley")
                ).otherwise(
                   col("4Wheel").isin("Subaru", "Toyota")
                ))

So if you have the following input:

+---+------+------+
|id |4Wheel|2Wheel|
+---+------+------+
|1  |Toyota|null  |
|2  |Subaru|null  |
|3  |GM    |null  |
|4  |null  |Yamaha|
|5  |      |Yamaha|
|6  |null  |Harley|
|7  |      |Harley|
|8  |null  |Indian|
|9  |      |Indian|
|10 |null  |null  |
+---+------+------+

You get the following filtered ouput:

+---+------+------+
|id |4Wheel|2Wheel|
+---+------+------+
|1  |Toyota|null  |
|2  |Subaru|null  |
|4  |null  |Yamaha|
|5  |      |Yamaha|
|6  |null  |Harley|
|7  |      |Harley|
+---+------+------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.