110

Suppose I have

val dirty = List("a", "b", "a", "c")

Is there a list operation that returns "a", "b", "c"

8 Answers 8

202

Have a look at the ScalaDoc for Seq,

scala> dirty.distinct
res0: List[java.lang.String] = List(a, b, c)

Update. Others have suggested using Set rather than List. That's fine, but be aware that by default, the Set interface doesn't preserve element order. You may want to use a Set implementation that explicitly does preserve order, such as collection.mutable.LinkedHashSet.

Sign up to request clarification or add additional context in comments.

3 Comments

What if you have a list of files and need to compare on something like part of the file name?
@ozone Interesting question. Maybe the easiest way is to create a new map of type Map[String, File], where the keys are the part of the file name of interest. Once the map is constructed, you can call the values method to get an Iterable of values--the keys will all be distinct by construction.
@KiptonBarros and I think you can do this by using the groupBy member of scala.collection.Iterable[A].
23

scala.collection.immutable.List now has a .distinct method.

So calling dirty.distinct is now possible without converting to a Set or Seq.

1 Comment

.distinct isn't defined for scala.collection.Iterable[A]. So in that case, you'd have to use upgrade dirty to a Seq or a Set anyways (i.e. by using either .toList, .toSeq or .toSet members) for this to work.
16

Before using Kitpon's solution, think about using a Set rather than a List, it ensures each element is unique.

As most list operations (foreach, map, filter, ...) are the same for sets and lists, changing collection could be very easy in the code.

Comments

9

Using Set in the first place is the right way to do it, of course, but:

scala> List("a", "b", "a", "c").toSet.toList
res1: List[java.lang.String] = List(a, b, c)

Works. Or just toSet as it supports the Seq Traversable interface.

1 Comment

I edited your answer because Set implements Traversable, not Seq. The difference is that Seq guarantees an order to the elements, whereas Traversable does not.
1

For Already-Sorted Lists

If you happen to want the distinct items of a list that you know is already sorted, as I have often needed, the following performs about twice the speed as .distinct:

  def distinctOnSorted[V](seq: List[V]): List[V] =
    seq.foldLeft(List[V]())((result, v) =>
      if (result.isEmpty || v != result.head) v :: result else result)
    .reverse

Performance results on a list of 100,000,000 random Ints from 0-99:

distinct        : 0.6655373s
distinctOnSorted: 0.2848134s

Performance with MutableList or ListBuffer

While it would seem that a more mutable / non-functional programming approach might be faster than prepending to an immutable list, practice shows otherwise. The immutable implementation consistently performs better. My guess for the reason is that scala focuses its compiler optimizations on immutable collections, and does a good job at it. (I welcome others to submit better implementations.)

List size 1e7, random 0 to 1e6
------------------------------
distinct            : 4562.2277ms
distinctOnSorted    : 201.9462ms
distinctOnSortedMut1: 4399.7055ms
distinctOnSortedMut2: 246.099ms
distinctOnSortedMut3: 344.0758ms
distinctOnSortedMut4: 247.0685ms

List size 1e7, random 0 to 100
------------------------------
distinct            : 88.9158ms
distinctOnSorted    : 41.0373ms
distinctOnSortedMut1: 3283.8945ms
distinctOnSortedMut2: 54.4496ms
distinctOnSortedMut3: 58.6073ms
distinctOnSortedMut4: 51.4153ms

Implementations:

object ListUtil {
  def distinctOnSorted[V](seq: List[V]): List[V] =
    seq.foldLeft(List[V]())((result, v) =>
      if (result.isEmpty || v != result.head) v :: result else result)
    .reverse

  def distinctOnSortedMut1[V](seq: List[V]): Seq[V] = {
    if (seq.isEmpty) Nil
    else {
      val result = mutable.MutableList[V](seq.head)
      seq.zip(seq.tail).foreach { case (prev, next) =>
        if (prev != next) result += next
      }
      result //.toList
    }
  }

  def distinctOnSortedMut2[V](seq: List[V]): Seq[V] = {
    val result = mutable.MutableList[V]()
    if (seq.isEmpty) return Nil
    result += seq.head
    var prev = seq.head
    for (v <- seq.tail) {
      if (v != prev) result += v
      prev = v
    }
    result //.toList
  }

  def distinctOnSortedMut3[V](seq: List[V]): List[V] = {
    val result = mutable.MutableList[V]()
    if (seq.isEmpty) return Nil
    result += seq.head
    var prev = seq.head
    for (v <- seq.tail) {
      if (v != prev) v +=: result
      prev = v
    }
    result.reverse.toList
  }

  def distinctOnSortedMut4[V](seq: List[V]): Seq[V] = {
    val result = ListBuffer[V]()
    if (seq.isEmpty) return Nil
    result += seq.head
    var prev = seq.head
    for (v <- seq.tail) {
      if (v != prev) result += v
      prev = v
    }
    result //.toList
  }
}

Test:

import scala.util.Random

class ListUtilTest extends UnitSpec {
  "distinctOnSorted" should "return only the distinct elements in a sorted list" in {
    val bigList = List.fill(1e7.toInt)(Random.nextInt(100)).sorted

    val t1 = System.nanoTime()
    val expected = bigList.distinct
    val t2 = System.nanoTime()
    val actual = ListUtil.distinctOnSorted[Int](bigList)
    val t3 = System.nanoTime()
    val actual2 = ListUtil.distinctOnSortedMut1(bigList)
    val t4 = System.nanoTime()
    val actual3 = ListUtil.distinctOnSortedMut2(bigList)
    val t5 = System.nanoTime()
    val actual4 = ListUtil.distinctOnSortedMut3(bigList)
    val t6 = System.nanoTime()
    val actual5 = ListUtil.distinctOnSortedMut4(bigList)
    val t7 = System.nanoTime()

    actual should be (expected)
    actual2 should be (expected)
    actual3 should be (expected)
    actual4 should be (expected)
    actual5 should be (expected)

    val distinctDur = t2 - t1
    val ourDur = t3 - t2

    ourDur should be < (distinctDur)

    print(s"distinct            : ${distinctDur / 1e6}ms\n")
    print(s"distinctOnSorted    : ${ourDur / 1e6}ms\n")
    print(s"distinctOnSortedMut1: ${(t4 - t3) / 1e6}ms\n")
    print(s"distinctOnSortedMut2: ${(t5 - t4) / 1e6}ms\n")
    print(s"distinctOnSortedMut3: ${(t6 - t5) / 1e6}ms\n")
    print(s"distinctOnSortedMut4: ${(t7 - t6) / 1e6}ms\n")
  }
}

3 Comments

This is pretty efficient as there are only 100 unique values, but you'd get into trouble if there were many more as you are using an immutable structure. To go even faster, you could implement this with a mutable structure.
@Nick I originally thought it would be that way also, however see the edits above.
I tried out the above myself as I can't understand why immutable would be better for this, but it continues to be so even if you increase the number of distinct values considerably. I also tried out a few mutable structures where prepend is more efficient but even without reversing the result at the end it's slower.
1

You can also use recursion and pattern matching:

def removeDuplicates[T](xs: List[T]): List[T] = xs match {
  case Nil => xs
  case head :: tail => head :: removeDuplicates(for (x <- tail if x != head) yield x)
}

1 Comment

removeDuplicates(tail.filter(_ != head))
-3

inArr.distinct foreach println _

1 Comment

this prints the desired output, wasn't OP asking to return it (as a List, presumably)?
-5

The algorithmic way...

def dedupe(str: String): String = {
  val words = { str split " " }.toList

  val unique = words.foldLeft[List[String]] (Nil) {
    (l, s) => {
      val test = l find { _.toLowerCase == s.toLowerCase } 
      if (test == None) s :: l else l
    }
  }.reverse

  unique mkString " "
}

1 Comment

He has a list, not a string. This doesn't answer the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.