Drop rows of Spark DataFrame that contain specific value in column using Scala

Question

I am tryping to drop rows of a spark dataframe which contain a specific value in a specific row. For example, if i have the following DataFrame, i´d like to drop all rows which have "two" in column "A". So i´d like to drop the rows with index 1 and 2. I want to do this using Scala 2.11 and Spark 2.4.0.

     A      B   C
0    one    0   0
1    two    2   4
2    two    4   8
3    one    6  12
4  three    7  14

I tried something like this:

df = df.filer(_.A != "two")

or

df = df.filter(df("A") != "two")

Anyway both did not work. Any suggestions how that can be done?

gasparms · Accepted Answer · 2020-11-03 17:31:02Z

2

Try:

df.filter(not($"A".contains("two")))

Or if you look for exact match:

df.filter(not($"A".equalTo("two")))

edited Nov 3, 2020 at 17:31

answered Nov 3, 2020 at 15:57

gasparms

3,35424 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Manu Over a year ago

Thanks for the answer, but that didn´t work for me.

gasparms Over a year ago

I edited my answer, filterNot is not standard in scala spark api. Check it. I test it and it works

Manu Over a year ago

Yes you are right, that way it works! Thank you! :)

Manu · Accepted Answer · 2020-11-03 16:17:34Z

1

I finally found the solution in a very old post: Is there a way to filter a field not containing something in a spark dataframe using scala?

The trick which does it is the following:

df = df.where(!$"A".contains("two")

answered Nov 3, 2020 at 16:17

Manu

431 silver badge8 bronze badges

Collectives™ on Stack Overflow

Drop rows of Spark DataFrame that contain specific value in column using Scala

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related