Scala: Filter duplicate names using groupBy, mapValues, and filter?

Question

This should be simple, I thought, but I can't get it to work. I have a case class with a property name. I want to check to see if their are duplicate names in it...

employees.groupBy(_.name).mapValues(_.size).filter(_._2 == 1).toSeq.isEmpty

That doesn't work... Should it?

Luigi Plinge · Accepted Answer · 2013-10-15 16:23:13Z

The other answers will work, but groupBy is an expensive operation and inefficient if you have 10k employees and the first two have the same name.

1) Short method: (distinct internally builds a HashSet of all the values)

val names = employees.map(_.name)
names.distinct != names

2) Efficient method: (only traverses as much of the seq as it needs to before finding dupe)

def hasDupes(employees: Seq[Employee]): Boolean = {
  val names = collection.mutable.Set.empty[String]
  employees.foreach { e => 
    if (names(e.name)) return true
    names += e.name
  }
  false
}

You can write this as a one-liner using a tail-recursive method, but the fact is that immutable sets are much slower than mutable ones so not much good if we're going for efficiency.

3) Off-the-wall method: (doesn't build a set at all, but starts a quicksort and quits (via a cheap exception) as soon as it finds two names the same. Should be efficient. )

class Dupe extends Throwable with util.control.NoStackTrace
val dupeOrd = new Ordering[Employee] {
  def compare(x: Employee, y: Employee) =
    if (x.name == y.name) throw new Dupe else x.name compare y.name
}
def hasDupes(employees: Seq[Employee]) =
  try { employees.sorted(dupeOrd); false } catch { case e: Dupe => true }

Benchmark of results, with method 0 being om-nom-nom's groupBy one, (time in ms for 1000 runs on 1000 employee Vector, no duplicates):

method             time  Early return if duplicate found
0  (groupBy)       1560  No
1  (distinct)      329   No
2a (mutable.Set)   255   Yes
2b (immutable.Set) 1414  Yes
3  (sorting)       666   Yes        //242 if employees already in name order

om-nom-nom · Accepted Answer · 2013-10-14 15:06:32Z

2

Perhaps you wanted:

case class Employee(name: String)
val bob = Employee("Bob")
val joe = Employee("Joe")

def haveDupes(emps: Seq[Employee]) =  emps.groupBy(_.name).exists {
 case (name, group) => group.size > 1
}

scala> haveDupes(Seq(bob))
res11: Boolean = false

scala> haveDupes(Seq(bob, bob))
res12: Boolean = true

scala> haveDupes(Seq(bob, joe))
res13: Boolean = false

scala> haveDupes(Seq(bob, joe, bob))
res14: Boolean = true

answered Oct 14, 2013 at 15:06

om-nom-nom

63k13 gold badges186 silver badges231 bronze badges

1 Comment

Brad Over a year ago

haveDupes worked great and does exactly what I was trying to do. Thank you!

Mateusz Dymczyk · Accepted Answer · 2013-10-14 15:12:32Z

0

Because here

employees.groupBy(_.name).mapValues(_.size).filter(_._2 == 1).toSeq.isEmpty

You don't want a == but a > sign

employees.groupBy(_.name).mapValues(_.size).filter(_._2 > 1).toSeq.isEmpty

First you groupBy which will return a set of tuples (K, V), then you map each of those to see how many of each there are with mapValues(_.size) and then you want to check if there are any records with size greater than 1.

edited Oct 14, 2013 at 15:12

answered Oct 14, 2013 at 15:07

Mateusz Dymczyk

15.2k10 gold badges63 silver badges94 bronze badges

2 Comments

Brad Over a year ago

Thanks for the answer. I'm actually wanting to determine if there are duplicates, so to do that I filter out where the count is 1 and see if it's empty right?

Mateusz Dymczyk Over a year ago

Actually that's not how filter works, it will leave the elements that pass the condition you give to the filter function. So in this case it will return a list that passes (._2 > 1). It does not filter *out* (remove) the elements that pass the condition you give to the function so saying filter(._2 == 1) returns all elements that have ._2 equal to 1.

Collectives™ on Stack Overflow

Scala: Filter duplicate names using groupBy, mapValues, and filter?

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related