1

In Java I have a df that looks like this:

NAME COLUMN_1 COLUMN_2
name_1 null some_value
name_2 some_value null
name_3 null null

I want to filter all rows that have null values for COLUMN_1 and COLUMN_2 so that the new dataset looks like:

NAME COLUMN_1 COLUMN_2
name_1 null some_value
name_2 some_value null

How do I keep the rows that have at least one value in COLUMN_1 and COLUMN_2

I tried the following filters but it seems the and statement is sequential and removes all rows from the df:

Column filter = col("COLUMN_1").isNotNull().and(col( "COLUMN_2").isNotNull());
df.filter(filter).show();

How do I keep the rows that have at least one value in COLUMN_1 and COLUMN_2

1 Answer 1

2

With your filter you are requesting that both COLUMN_1 and COLUMN_2 must be not null in order to be included in the result.

What you really want is that at least one of COLUMN_1 and COLUMN_2 are not null, which can be achieved with an or:

Column filter = col("COLUMN_1").isNotNull().or(col("COLUMN_2").isNotNull());
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.