6

What would be a fast a safe way to convert a String to a numeric type, while providing a default value when the conversion fails ?

I tried using the usually recommended way, i.e. using Exceptions:

implicit class StringConversion(val s: String) {

  private def toTypeOrElse[T](convert: String=>T, defaultVal: T) = try {
    convert(s)
  } catch {
    case _: NumberFormatException => defaultVal
  }

  def toShortOrElse(defaultVal: Short = 0) = toTypeOrElse[Short](_.toShort, defaultVal)
  def toByteOrElse(defaultVal: Byte = 0) = toTypeOrElse[Byte](_.toByte, defaultVal)
  def toIntOrElse(defaultVal: Int = 0) = toTypeOrElse[Int](_.toInt, defaultVal)
  def toDoubleOrElse(defaultVal: Double = 0D) = toTypeOrElse[Double](_.toDouble, defaultVal)
  def toLongOrElse(defaultVal: Long = 0L) = toTypeOrElse[Long](_.toLong, defaultVal)
  def toFloatOrElse(defaultVal: Float = 0F) = toTypeOrElse[Float](_.toFloat, defaultVal)
}

Using this utility class, I can now easily convert any String to a given numeric type, and provide a default value in case the String is not representing correctly the numeric type:

scala> "123".toIntOrElse()
res1: Int = 123
scala> "abc".toIntOrElse(-1)
res2: Int = -1
scala> "abc".toIntOrElse()
res3: Int = 0
scala> "3.14159".toDoubleOrElse()
res4: Double = 3.14159
...

While it works beautifully, this approach does not seem to scale well, probably because of the Exceptions mechanism:

scala> for (i<-1 to 10000000) "1234".toIntOrElse()

takes roughly 1 second to execute whereas

scala> for (i<-1 to 10000000) "abcd".toIntOrElse()

takes roughly 1 minute!

I guess another approach would be to avoid relying on exceptions being triggered by the toInt, toDouble, ... methods.

Could this be achieved by checking if a String "is of the given type" ? One could of course iterate through the String characters and check that they are digits (see e.g. this example), but then what about the other numeric formats (double, float, hex, octal, ...) ?

5
  • 1
    Regex is probably your best way to go here if you completely want to avoid the overhead of the try/catch semantic. You just need to come up with regexes for each of the possible numeric types you want to be able to convert from. But honestly, that is probably a premature optimization. How fast does this code need to be? How often is it hit? How often will it get invalid numbers this hitting the catch block? These are questions you need to ask yourself before optimizing as the code gets a bit more complex. Commented May 8, 2014 at 12:39
  • @cmbaxter I agree with you but I'm using this in a Big Data context, where I parse huge CSV files (Billions of rows), so it matters. Commented May 8, 2014 at 14:19
  • Fair enough. Then I would go with Regex to vet the string first. Will be much faster. Commented May 8, 2014 at 14:22
  • 1
    @pbr consider updated answer with enriched characters that may belong to a numeric value, yet to avoid performance penalty, no specialised parsing done. This may prove helpful for filtering out most non numeric values. Commented May 8, 2014 at 17:14
  • @pbr consider also stackoverflow.com/a/16699049/3189923 (and Apache Commons Lang). Commented May 8, 2014 at 20:53

1 Answer 1

1

As a first approach, filter out those input strings that do not contain any digit

private def toTypeOrElse[T](convert: String=>T, defaultVal: T) = try {
  if (s.contains("[0-9]")) convert(s) {
    else defaultVal
  } catch {
    case _: NumberFormatException => defaultVal
  }
}

Update

Enriched set of characters that may occur in a numeric value, yet no order of occurrence or limits in repetition considered,

private def toTypeOrElse[T](convert: String=>T, defaultVal: T) = try {
    if (s matches "[\\+\\-0-9.e]+") convert(s)
    else defaultVal
  } catch {
    case _: NumberFormatException => defaultVal
  }
}
Sign up to request clarification or add additional context in comments.

1 Comment

Why not filter out the ones, that don't contain only digits or a -?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.