16

I have a List[A], how is a idiomatic way of removing duplicates given an equality function (a:A, b:A) => Boolean? I cannot generally override equalsfor A

The way I can think now is creating a wrapping class AExt with overridden equals, then

list.map(new AExt(_)).distinct

But I wonder if there's a cleaner way.

3

6 Answers 6

19

There is a simple (simpler) way to do this:

list.groupBy(_.key).mapValues(_.head)

If you want you can use the resulting map instantly by replacing _.head by a function block like:

sameElements => { val observedItem = sameElements.head
                  new A (var1 = observedItem.firstAttr,
                         var2 = "SomethingElse") }

to return a new A for each distinct element.

There is only one minor problem. The above code (list.groupBy(_.key).mapValues(_.head)) didnt explains very well the intention to remove duplicates. For that reason it would be great to have a function like distinctIn[A](attr: A => B) or distinctBy[A](eq: (A, A) -> Boolean).

Sign up to request clarification or add additional context in comments.

2 Comments

This way the result will be a map on the unique key. To get just distinct values, one should .map(_._2.head)
This works in most cases; care should be taken in lazy evaluation cases, because groupBy tends to imply eager evaluation.
8

Using the Foo and customEquals from misingFaktor's answer:

  case class Foo(a: Int, b: Int)
  val (a, b, c, d) = (Foo(3, 4), Foo(3, 1), Foo(2, 5), Foo(2, 5))
  def customEquals(x: Foo, y: Foo) = x.a == y.a

  (Seq(a, b, c, d).foldLeft(Seq[Foo]()) {
    (unique, curr) => {
      if (!unique.exists(customEquals(curr, _)))
        curr +: unique
      else
        unique
    }
  }).reverse

If result ordering is important but the duplicate to be removed is not, then foldRight is preferable

  Seq(a, b, c, d).foldRight(Seq[Foo]()) {
    (curr, unique) => {
      if (!unique.exists(customEquals(curr, _)))
        curr +: unique
      else
        unique
    }
  }

Comments

4

I must say I think I'd go via an intermediate collection which was a Set if you expected that your Lists might be quite long as testing for presence (via exists or find) on a Seq is O(n) of course:

Rather than write a custom equals; decide what property the elements are equal by. So instead of:

def myCustomEqual(a1: A, a2: A) = a1.foo == a2.foo && a1.bar == a2.bar

Make a Key. Like so:

type Key = (Foo, Bar)
def key(a: A) = (a.foo, a.bar)

Then you can add the keys to a Set to see whether you have come across them before.

var keys = Set.empty[Key]
((List.empty[A] /: as) { (l, a) => 
  val k = key(a)
  if (keys(k)) l else { keys += k; a +: l  }
}).reverse

Of course, this solution has worse space complexity and potentially worse performance (as you are creating extra objects - the keys) in the case of very short lists. If you do not like the var in the fold, you might like to look at how you could achieve this using State and Traverse from scalaz 7

1 Comment

Ended up implemeting something simpler but based on your suggestion by identifying the keys.
3
scala> case class Foo(a: Int, b: Int)
defined class Foo

scala> val (a, b, c, d) = (Foo(3, 4), Foo(3, 1), Foo(2, 5), Foo(2, 5))
a: Foo = Foo(3,4)
b: Foo = Foo(3,1)
c: Foo = Foo(2,5)
d: Foo = Foo(2,5)

scala> def customEquals(x: Foo, y: Foo) = x.a == y.a
customEquals: (x: Foo, y: Foo)Boolean

scala> Seq(a, b, c, d) filter {
     |   var seq = Seq.empty[Foo]
     |   x => {
     |    if(seq.exists(customEquals(x, _))) {
     |      false 
     |    } else { 
     |      seq :+= x
     |      true 
     |    }
     | }
res13: Seq[Foo] = List(Foo(3,4), Foo(2,5))

3 Comments

This answer is inspired by the implementation of distinct: lampsvn.epfl.ch/trac/scala/browser/scala/tags/R_2_9_1_final/src/… So looking at the source code can sometimes help.
No, this answer not inspired by that implementation. I wrote something that came to my mind. This is actually not even a good solution. @drstevens' answer is better.
My bad, I looked for how it was implemented in scala, as I consider it as a good source of inspiration and then saw that it was the same form as your implementation. However I agree with you that a foldleft is more elegant.
3

Starting Scala 2.13, we can use the new distinctBy method which returns elements of a sequence ignoring the duplicates as determined by == after applying a transforming function f:

def distinctBy[B](f: (A) => B): List[A]

For instance:

// case class A(a: Int, b: String, c: Double)
// val list = List(A(1, "hello", 3.14), A(2, "world", 3.14), A(1, "hello", 12.3))
list.distinctBy(x => (x.a, x.b)) // List(A(1, "hello", 3.14), A(2, "world", 3.14))
list.distinctBy(_.c)             // List(A(1, "hello", 3.14), A(1, "hello", 12.3))

Comments

-1
case class Foo (a: Int, b: Int)

val x = List(Foo(3,4), Foo(3,1), Foo(2,5), Foo(2,5))
def customEquals(x : Foo, y: Foo) = (x.a == y.a && x.b == y.b)

x.foldLeft(Nil : List[Foo]) {(list, item) => 
   val exists = list.find(x => customEquals(item, x))
   if (exists.isEmpty) item :: list
   else list
 }.reverse

res0: List[Foo] = List(Foo(3,4), Foo(3,1), Foo(2,5))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.