When to use Scala Vector, when Scala Array?

Question

If I need an Array with multiple degrees, I can't use a Vector. But let's consider the simple case of having only one degree: When to use Scala Vector, when Scala Array?

Vector when you want immutability. Array when you want mutability. — sjrd
– sjrd, Commented Oct 6, 2016 at 20:17
You might enjoy this recent blog post about scala collections — Barry
– Barry, Commented Oct 6, 2016 at 20:20

Dibbeke · Accepted Answer · 2016-10-06 20:53:56Z

When it comes to time and space complexity, arrays are surprisingly versatile. You might expect that arrays are slow with regard to inserts and deletes until you consider modern memory architectures. CPUs can prefetch and stream arrays straight from memory while performing linear operations on them, such as copying for an insert or delete. Most other data-structures requires expensive indirections, defeating prefetching caches.

Immutability

Since linear access to arrays is very fast, I often (for smaller arrays) consider them as immutable and copy them on write.

How to choose

When I consider a data-structure for a certain task, I start by analyzing the performance implemented as a simple array. Only after this first step, I weigh the benefits and penalties of existing abstractions, such as vectors. Possible benefits of other data structures might be readability, code complexity, performance at scale, opportunities for garbage collection, ease of serialization and cache coherence. Readability and code complexity are on the top of my list, and this often weighs in favor of abstract data structures such as Vectors, Lists, Streams and Maps.

Consider GPU acceleration

When starting with arrays, I always consider the possibility of GPU execution. For example, machine learning heavily relies on vector (not to be confused with Scala vector) and matrix operations (linear algebra), which is accelerated on GPU hardware and often less memory intensive.

stefanobaghino · Accepted Answer · 2016-10-06 20:53:42Z

0

Choosing a data structure is, as always, a matter of context.

First of all, you have to take into account the issue at hand, the access pattern you expect to have and the performance characteristics. The Scala documentation includes a great comparison. Both collection share the common trait of being indexed, allowing fast random access, but you'll notice some differences between the two.

A key difference between the two, as suggested in the comments, is that a Vector is an immutable collection, while Arrays are mutable.

Furthermore, Arrays in Scala are effectively mapped over Java native arrays, making it quite easy to write idiomatic Scala code that can be used by just as idiomatic Java code elsewhere.

For further details, both the Array and Vector pages of the official Scala documentation include a good description. You can learn even more in the documentation section reserved to collections.

answered Oct 6, 2016 at 20:53

stefanobaghino

13.4k4 gold badges44 silver badges69 bronze badges

7 Comments

Dibbeke Over a year ago

Please note that these performance characteristics, based on complexity analysis, are a huge simplification of modern systems architecture. The actual performance of a data structure might (and likely is) different. One cannot do without measuring with micro-benchmarks and considering the CPU architecture, virtual machine behaviour and garbage collector behaviour in the relationship with a certain data structure. After doing this analysis, you will most likely come to the conclusion that complexity analysis is mostly an academic exercise.

stefanobaghino Over a year ago

@Dibbeke I agree with you (apart from your last consideration, which I think is too far reaching), but the question seemed to require a more high level reply. Furthermore, when using an Array you have to be very careful of what you do with it, because at the first implicit conversion to a WrappedArray you end up manipulating something very different from a native Array. Idiomatic Scala code will seldom take advantage of the optimizations you cited.

Dibbeke Over a year ago

the last part of my comment was too much, but it was triggered by an inconvenience of the current state of affairs. It so often seems that everyone is content with doing an undergrad complexity analysis and call it a day. I refute since a complexity analysis is a nice thing to prevent the average joe from stepping on rakes, but in modern architectures, especially given a VM, it distracts from the essence and gives people a warm fussy feeling so they don't have to think. And yes, WrappedArray is a dangerous thing. We need some macro support here I guess.

stefanobaghino Over a year ago

I think it depends a lot on the problem at hand. Complexity analysis is an abstraction that provides engineers with a good heuristic for a good choice. Furthermore, as you mentioned in your answer, readability and code complexity should come first, especially when first iterating over a piece of code. Taking in consideration the underlying machine is important but it's a difficult task which requires many optimization that may be bespoke for a certain problem and a certain hardware. I don't see the problem in using complexity analysis as a first spearhead into the problem.

Make42 Over a year ago

In docs.scala-lang.org/overviews/collections/overview.html, Arrays are with the immutable collections... !?

|

Abhijit Sarkar · Accepted Answer · 2024-09-08 23:42:43Z

0

The Scala 3 official collections documentation doesn't even show or mention the Array type. It seems like an omission, I've created a ticket to get it fixed.

The Array API docs say:

Arrays are mutable, indexed collections of values. Array[T] is Scala's representation for Java's T[].

Vectors, on the other hand, are the immutable indexed collections.

However, there's the perhaps more popular ArrayBuffer, which, in fact, has a place in the official docs. So, if you're looking for mutability, should you use the Array or the ArrayBuffer? The short answer is, as always, it depends. ArrayBuffer is resizable, Array isn't. Arrays are specialized for built-in value types (except Unit), so Array[Int] is going to be more optimal than ArrayBuffer[Int] – the values won't have to be boxed.

See this SO answer for more details on the differences between ArrayBuffer and Array.

answered Sep 8, 2024 at 23:42

Abhijit Sarkar

25.2k23 gold badges136 silver badges251 bronze badges

3 Comments

Make42 Over a year ago

So, an ArrayBuffer is like a list in Python, and Array is like a tuple in Python?

Abhijit Sarkar Over a year ago

@Make42 Python Tuple is immutable while Scala Array isn’t.

Make42 Over a year ago

So, Array ~ np.array and Vector ~ tuple?

Collectives™ on Stack Overflow

When to use Scala Vector, when Scala Array?

3 Answers 3

Immutability

How to choose

Consider GPU acceleration

Comments

7 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Immutability

How to choose

Consider GPU acceleration

Comments

7 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related