1

This question was inspired by Extract numbers from String Array question.

Consider we have a List of arbitrary alphabetic and numeric strings:

val ls = List("The", "first", "one", "is", "11", "the", "second", "is" "22") 

The goal is to form a list of numbers extracted from the original list: val nums: List[Int] = List(11, 22)

There are two different approaches possible (AFAIK):

  1. Using Try construct:

    val nums = ls.flatMap(s => Try(s.toInt).toOption)
    

    This solution looks concise but it will have a huge overhead to handle exceptions.

  2. Using matches method:

    val nums = ls.filter(_.matches("\\d+")).map(_.toInt)
    

    Here the most time-consuming part is regexp matching.

Which one is better by performance?

From my point of view usage of exception mechanism in such simple operation is a like "using a sledge-hammer to crack a nut".

4
  • 3
    Don't use exceptions with normal control flow. It's misleading, bad for performance and doesn't scale. Use exceptions for the exceptional cases. Commented Nov 8, 2016 at 11:16
  • 1
    I'm not saying you should go the regex way, but you could optimize your version by a) (re-)using a precompiled regex, such as val Reg = "(\\d+)".r and b) doing the filter and the map in one single step instead of two, e.g. ls.collect{ case Reg(n) => n.toInt } Commented Nov 8, 2016 at 12:07
  • 1
    If it's really about performance (and not about style or idioms): in your example, only 2 out of 9 elements are valid numbers. Is that a typical composition of your input list? Discussing performance doesn't make sense without knowing that aspect. If your list contains little to no strings which aren't numbers, then the exception variant will surely be faster (since there's nothing to catch), whereas the regex variant will perform the better the more invalid numbers are part of the list. There is no universal answer to your question. Commented Nov 8, 2016 at 13:08
  • @fxlae I'm learning scala and this question arose when I saw that the most popular answer for the question mentioned at the beginning of my post uses Try which is not good I think. The universal answer to my question IMHO should consider three different situations: if F is the possibility to meet number in the list than the first situation is F is almost 0, the second F = 0,5, and finally F is almost 1. BTW: I'm quite new to stack overflow, should I update my question with these details or comment is enough? Commented Nov 9, 2016 at 9:06

1 Answer 1

2

I highly recommend you test this stuff out yourself, you can learn a lot! Commence Scala REPL:

scala> import scala.util.Try
import scala.util.Try

< import printTime function from our repo >

scala> val list = List("The", "first", "one", "is", "11", "the", "second", "is", "22")
list: List[String] = List(The, first, one, is, 11, the, second, is, 22)

scala> var x: List[Int] = Nil
x: List[Int] = List()

OK, the environment is set up. Here's your first function (Try):

scala> def f1(l: List[String], n: Int) = { 
  var i = 0
  while (i < n) { 
    x = l.flatMap(s => Try(s.toInt).toOption)
    i += 1 
  }
}
f1: (l: List[String], n: Int)Unit

The second function (regex):

scala> def f2(l: List[String], n: Int) = { 
  var i = 0
  while (i < n) { 
    x = l.filter(_.matches("\\d+")).map(_.toInt)
    i += 1 
  }
}
f2: (l: List[String], n: Int)Unit

Timings:

scala> printTime(f1(list, 100000)) // Try
time: 4.152s

scala> printTime(f2(list, 100000)) // regex
time: 565.107ms

Well, we've learned that handling exceptions inside a flatMap is a very inefficient way to do things. This is partly because exception handling produces bad assembly code, and partly because flatMaps with options do a lot of extra allocation and boxing. Regex is ~8x faster! But...is regex fast?

scala> def f3(l: List[String], n: Int) = { 
  var i = 0
  while (i < n) { 
    x = l.filter(_.forall(_.isDigit)).map(_.toInt)
    i += 1 
  }
}
f3: (l: List[String], n: Int)Unit

scala> printTime(f3(list, 100000)) // isDigit
time: time: 70.960ms

Replacing regex with character isDigit calls gave us another order of magnitude improvement. The lesson here is to avoid try/catch handling at all costs, avoid using regex whenever possible, and don't be afraid to write performance comparisons!

Sign up to request clarification or add additional context in comments.

2 Comments

Take a list that only contains strings representing valid numbers and the exception variant will be much faster. Yes, you used the list that was posted by the OP, but I'm not sure the OP is aware of this problem in the first place. What I'm trying to say: benchmarking is a difficult topic, especially if we can't make reasonable assumptions about the input.
You're right that if exceptions are never thrown, then the Try version is almost as fast as the isDigit version: testing on 1m iterations on a list with 8 numeric strings, the times are 1.428s (Try), 4.631s (regex), and 0.938s (isDigit). Add one non-numeric element, and the times are: 7.148s (Try), 5.962s (regex), and 0.992s (isDigit). If you're throwing exceptions with any reasonable frequency (here was ~10% of elements), then Try is definitely the slowest model.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.