Filter NULL value in dataframe column of spark scala

Question

I have a dataframe(df) with following:

+---------+--------------------+
|  col1|        col2           |
+---------+--------------------+
|colvalue1|                NULL|   
|colvalue2|col2value...        |
+------------+-----------------+

I am trying to filter rows based on the col2 as follows

df.filter(($"col2".isNotNULL) || ($"col2" !== "NULL")  || ($"col2" !== "null")  || ($"col2".trim !== "NULL"))

But the row which has NULL is not filtering. This column show nullable=true.

Can anyone let me know what mistake I am doing? I am using Spark 1.6.

Anahcolus · Accepted Answer · 2018-05-23 02:03:02Z

6

Your !== notation is wrong which should be =!=, and you can't do $"col2".trim and since you have used negations with ||, one of the negation is always true. In your example ($"col2".isNotNULL) is always true so every rows are filtered-in. So individual negation combined by || should be taken with care.

So the correct form is

df.filter(!($"col2".isNull || ($"col2" === "NULL") || ($"col2" === "null")))

or even better if you use inbuilt function isnull and trim

df.filter(!(isnull($"col2") || (trim($"col2") === "NULL") || (trim($"col2") === "null")))

edited May 23, 2018 at 2:03

answered May 23, 2018 at 1:42

Anahcolus

42.1k6 gold badges75 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Fabio Over a year ago

was this (trim($"col2") === "null") necessary? Should spark catalist be case insensitive?

Collectives™ on Stack Overflow

Filter NULL value in dataframe column of spark scala

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related