Java - How to filter rows in dataframe that have null values for specific columns

Question

In Java I have a df that looks like this:

I want to filter all rows that have null values for COLUMN_1 and COLUMN_2 so that the new dataset looks like:

NAME	COLUMN_1	COLUMN_2
name_1	null	some_value
name_2	some_value	null

How do I keep the rows that have at least one value in COLUMN_1 and COLUMN_2

I tried the following filters but it seems the and statement is sequential and removes all rows from the df:

Column filter = col("COLUMN_1").isNotNull().and(col( "COLUMN_2").isNotNull());
df.filter(filter).show();

How do I keep the rows that have at least one value in COLUMN_1 and COLUMN_2

vinsce · Accepted Answer · 2022-11-20 09:51:28Z

2

With your filter you are requesting that both COLUMN_1 and COLUMN_2 must be not null in order to be included in the result.

What you really want is that at least one of COLUMN_1 and COLUMN_2 are not null, which can be achieved with an or:

Column filter = col("COLUMN_1").isNotNull().or(col("COLUMN_2").isNotNull());

answered Nov 18, 2022 at 19:52

vinsce

1,3581 gold badge10 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

1 Answer 1