Why are Lists faster than character arrays for string concatenation

Question

In the article linked below, the author compares the efficiency of different string concatenation methodologies in python: http://www.skymind.com/~ocrow/python_string/

One thing that I did not understand is, why does method 3 (Mutable Character Arrays) result in a significantly slower performance than method 4 (joining a list of strings)

both of them are mutable and I would think that they should have comparable performance.

Presumably array.fromstring hasn't been optimized the way str.join has. — abarnert
– abarnert, Commented Aug 29, 2013 at 0:38
Also, note that this article is from 2004. Naive str appending is much faster in newer versions of Python, and so is str.join… — abarnert
– abarnert, Commented Aug 29, 2013 at 0:39
What @abarnert says is totally relevant. When that article was written, the latest version of Python was 2.3.3 (see python.org/download/releases and python.org/download/releases/2.3.4). Benchmarks done at that time are largely meaningless today. — Greg Hewgill
– Greg Hewgill, Commented Aug 29, 2013 at 0:45
Related: how does string concatenation compare between 2.7.5 and 3.3.2 -- unicode vs ascii? — blakev
– blakev, Commented Aug 29, 2013 at 0:47
@blakev: But from a very quick test of join with an already-constructed list: bytes takes 1.0x as long in 3.3, unicode with all ASCII 0.5x, unicode with all UCS-2 0.8x, unicode with non-UCS-2 1.0x, unicode with mixed strings 1.05x. — abarnert
– abarnert, Commented Aug 29, 2013 at 0:58

abarnert · Accepted Answer · 2013-08-29 00:59:56Z

6

"Both of them are mutable" is misleading you a bit.

It's true that in the list-append method, the list is mutable. But building up the list isn't the slow part. If you have 1000 strings of average length 1000, you're doing 1000000 mutations to the array, but only 1000 mutations to the list (plus 1000 increfs to string objects).

In particular, that means the array will have to spend 1000x as much time expanding (allocating new storage and copying the whole thing so far).

The slow part for the list method is the str.join call at the end. But that isn't mutable, and doesn't require any expanding. It uses two passes, to first calculate the size needed, then copy everything into it.

Also, that code inside str.join has had (and has continued to have since that article was written 9 years ago) a lot of work to optimize it, because it's a very common, and recommended, idiom that many real programs depend on every day; array has barely been touched since it was first added to the language.

But if you really want to understand the differences, you have to look at the source. In 2.7, the main work for the array method is in array_fromstring, while the main work for the list method is in string_join. You can see how the latter takes advantage of the fact that we already know all of the strings we're going to be joining up at the start, while the former can't.

edited Aug 29, 2013 at 0:59

answered Aug 29, 2013 at 0:50

abarnert

368k54 gold badges626 silver badges691 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user2357112 Over a year ago

"It uses two passes, to first calculate the size needed, then copy everything into it." - really? Is that a list-specific optimization I wasn't aware of? It can't work for general iterables.

user2357112 Over a year ago

Ah, it makes a tuple out of the input if it's not already a tuple or list.

user2357112 Over a year ago

Your bytes_join link is broken. Try string_join.

abarnert Over a year ago

@user2357112: Thanks; I was looking at the 3.x code, and I thought I'd switched over to 2.7 and re-clicked the line, but obviously not… Anyway, str.join has had that optimization forever, which is why people are often surprised when they attempt to micro-optimize by passing it a genexpr instead of a listcomp and find that it's actually slightly slower…

Collectives™ on Stack Overflow

Why are Lists faster than character arrays for string concatenation

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related