4

I am a former Java developer and I have recently watched the insightful and entertaining introduction to Scala for Java developers by professor Venkat Subramaniam (https://www.youtube.com/watch?v=LH75sJAR0hc).

A major point introduced is the elimination of declared types in lieu of "type inference". Presumably, this means the higher-order compiler recognizes the type I intend to use, by the context.

Being an application security expert by trade, the first thing I tried to do is break this type inference... Example:

// declare a function that returns the square of an input Int. The return type is to be inferred.
scala> val square = (x:Int) => x*x
square: Int => Int = <function1>
// I can see the compiler inferred an Int for the output value, which I do not agree with.

scala> square(2147483647)
res1: Int = 1
// integer overflow

My question is why did the compiler not see that "*" is an operator with a threat of overflow, and wrap the inputs in something a little more protective like a BigInteger?

According to the professor, I am supposed to forget about the internal implementation and just get on with my business logic. But after my quick demonstration I'm not so sure that Scala is safe for a programmer who doesn't understand what the compiler is doing with my methods.

2
  • 2
    This doesn't really have anything to do with type inference. * on Int could return a BigInt, but that's an API design decision, and has more or less nothing to do with the compiler. Commented Oct 25, 2014 at 20:57
  • I should change my example so that the distraction of RETURNING a BigInteger is gone. For instance, googleresearch.blogspot.co.uk/2006/06/… Commented Oct 27, 2014 at 21:01

6 Answers 6

4

I think @rightføld somewhat overstates how often overflows do or don't happen (particularly when considering an attacker who is actively trying to overflow you). But I agree with his basic point. Converting all math to BigInteger would almost certainly have created a massive performance impact over Java. For developers to choose such a language, they'd have to get something visible for that cost.

String objects have a much smaller performance overhead over cstrings for many operations. They also provide very visible benefits to the developer, which is why people use them, not security per se. There are many common things that string objects make easy to do over cstrings. BigInteger provides none of that. It requires exactly the same code at a fraction of the speed, but just won't overflow (a bug few developers see day to day, even if security guys see it more often).

The equivalent would have been a cstring (with strcmp, strcpy, strcat, etc.) that ran at a fraction of the speed, but just didn't require a null terminator. I don't think many people would have jumped to use that, either, no matter how much that would help security over null-terminated strings. And if the language required it, I don't see a lot of people anxious to use the language.

And as @rightføld suggests in the comments, interoperability with Java would be trashed, since most if not all numbers would wind up being BigInteger. You'd constantly be converting, which raises the same dangers of overflows while adding a lot of code complexity (and more performance impacts).

A from-scratch language might get away with ubiquitous BigInteger (like python) if the language had a lot of other compelling features, but it's a very hard thing to retrofit into a language that wants to be a natural transition from (and with) Java.

Sign up to request clarification or add additional context in comments.

5 Comments

With an advanced modern JIT compiler I just don't see that. Why not see it coming, close to runtime if necessary, that the values are going to overflow? I think I am very likely asking the wrong question, and need to learn a bit more to ask a better question. It's just... I'm not seeing a lot of real useful modernity here, just less typing. I was hoping for more in terms of built-in security.
There already is overflow checking in Java, so you throw exceptions rather than running off into random memory like in C. That already is pretty costly. And there are so many APIs that can't (and you wouldn't want to) accept arbitrarily large numbers that just going to BigInteger doesn't get you that much free security. It's not that it's a crazy idea, just not that it's an obvious "how could you do otherwise" idea.
(Sorry; mixing languages there. There isn't an overflow exception in Java. There's a bounds exception.)
@DouglasHeld I think you are thinking of the behavior of a dynamically typed language like python, were a name x could refer to Int, or BigInt, or a BigDog for that matter, and the language if free to change it at runtime. In a statically typed language like scala, if x is an Int, it must stay an Int forever.
Hadn't occurred to me that he meant that. Yes, that's a very important point. We have to pick our types at compile time, not runtime. While the JIT might reorder things, you certainly wouldn't want it to change types on you. One might imagine a new "scalingint" that cleverly changed its implementation based on what it held, and maybe that could be made fast (ObjC's NSNumber is basically like this on 64-bit platforms). But it still wouldn't interoperate with existing libraries.
4

In addition to the above answers, I think this question misunderstands the purpose of type inference in a statically typed language. Type inference does not make the choices that you are referring to - promoting a Int to a BigInt. It is restricted to simply "inferring" the type of an expression based the the known types of subexpressions at compile time.

The * function in Int returns an Int when supplied with an Int input parameter

def *(x: Int): Int

In this case, since x is declared to be an Int, then x*x must be an Int based on the signature of *.

If we really wanted this behavior, we could define a function that promotes Int to BigInt when multiplying.

implicit class SafeInt(x: Int) {
  def safeMult(a: Int): scala.math.BigInt = scala.math.BigInt(x)*a
}

Then when we can define a square with the desired property:

scala> val square = (x: Int) => x safeMult x
square: Int => scala.math.BigInt = <function1>

2 Comments

The whole point of the OP's argument is that the type of T * T for fixed-width types is not T, it's T^2, i.e. with (potentially) doubled space requirements. On the other hand, even lowly addition and subtraction have T' as result, i.e. the result type needs to be potentially 1 bit bigger than the biggest operand type. That's why proper type inference would be impractical without runtime support, growing types only when it's actually necessary. Then, and only then, can the type of 'T op T' be simplified to 'T'. The runtime overhead for a language like Scala or Java would be negligible.
In Scala, Int * Int is defined to be Int (with possible overflow.) One could argue that fixed width 32-bit signed integer should not be the default type of integers in Scala, maybe Int should be more like SafeLong as mentioned by @lmm. But it does not mean that the type inference is doing something wrong here.
3

The compiler infers based on the methods available. Int has a method *(Int): Int that is, as far as the compiler knows, perfectly well defined; 2147483647*2147483647 is a perfectly good method call with the result 1, it doesn't throw ClassCastException or anything like that.

Why is the Int type written this way? Largely for Java/JVM compatibility; many parts of Scala have design compromises for the sake of Java compatibility. If you don't need that functionality, you might prefer to use Haskell or a similar language. (I suspect that even without the requirement for JVM compatibility, Scala would have wanted to expose the machine-native integer types so that users could make that performance/correctness tradeoff where desired. They might not have been the default though)

If you're doing numeric computation in Scala you probably want to use the Spire library, which makes it easy to abstract over numeric types, and provides several high-performance numeric types with particular properties. In particular it has a SafeLong type that handles arbitrary-precision integers but with much better performance than BigInt for values which fall within the Long range, similar to Python's integer type.

Comments

2

Because overflow occurs almost never in practice, and BigInteger is slow as a dog compared to Int. It is also most inconvenient to have all * operations on Ints return BigIntegers.

2 Comments

If that is the design attitude, why not leave in C strings? They're faster and lightweight.
Unlike integer overflow, C string bugs occur almost always in practice (mostly during development, thankfully), which is why only fools use C. Also, Java interoperability.
2

"Recognizes the type I intend to use" is not an accurate description of what scala tries to do. It infers the most generic type possible given the constraints imposed by the context. Hence if you write List(Nil, "1"), you'll get List[Serializable], because Serializable is an interface that List and String share - disregarding that Serializable was probably not on your mind at all.

The question you're asking could be asked more precisely as "why is Int the type of numeric literals instead of BigInteger?" - inference doesn't have much to do with it.

And we can opine all we want on that topic, but there's one most accurate answer describing why Scala is what it is: "because Java".

Comments

1

If you wanted the type of safety that you seem to want, then one approach is to define via a partial function which guards against numeric overflow and then returns either an Option[Int] or even perhaps an Either[Int, BigInteger].

The type inference for your square function is correct - given that it's inferred from the input types you've specified and the type of the * function...it's not really broken in my opinion.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.