1

I am tryping to drop rows of a spark dataframe which contain a specific value in a specific row. For example, if i have the following DataFrame, i´d like to drop all rows which have "two" in column "A". So i´d like to drop the rows with index 1 and 2. I want to do this using Scala 2.11 and Spark 2.4.0.

     A      B   C
0    one    0   0
1    two    2   4
2    two    4   8
3    one    6  12
4  three    7  14

I tried something like this:

df = df.filer(_.A != "two")

or

df = df.filter(df("A") != "two")

Anyway both did not work. Any suggestions how that can be done?

2 Answers 2

2

Try:

df.filter(not($"A".contains("two")))

Or if you look for exact match:

df.filter(not($"A".equalTo("two")))
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the answer, but that didn´t work for me.
I edited my answer, filterNot is not standard in scala spark api. Check it. I test it and it works
Yes you are right, that way it works! Thank you! :)
1

I finally found the solution in a very old post: Is there a way to filter a field not containing something in a spark dataframe using scala?

The trick which does it is the following:

df = df.where(!$"A".contains("two")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.