Python list elements vs converting to tuple for string formatting

Question

I came across a problem on codewars and am not sure what the difference is between these two possible solutions, one converting a list into a tuple and one specifying elements of the input list.

Problem: convert a list of names (strings) to a statement similar to what Facebook uses to display likes: "Alex likes this", "Alex and John like this", "Alex, John and 2 others like this", etc.

Using a if-elif-etc statement, this is pretty trivial:

    if len(names) == 0:
        output_string = "no one likes this"
    elif len(names) == 1:
        output_string = str(names[0]) + " likes this"

But in the longer lists of names, you have a choice:

    elif len(names) == 2:
        output_string = "%s and %s like this" % (names[0], names[1])

OR

    elif len(names) == 3:
        output_string = "%s, %s and %s like this" % tuple(names)

My hypothesis is that it's more computationally efficient to use names[0] etc, because you don't create a new object in memory for the tuple - is that right?

Why is computational efficiency a concern here? In the real world you're going to send a string like this across the Internet, which is vastly slower than anything you could come up with here. — Karl Knechtel
– Karl Knechtel, Commented Dec 12, 2020 at 2:21
@KarlKnechtel I am mostly curious what's a better practice - my first move was to convert to tuples but that was based on "too lazy to type out list indices" and I couldn't figure out if it was an okay way to do it our not. That's a great point though - why bother worrying about it, either is probably okay enough. — trish_s
– trish_s, Commented Dec 12, 2020 at 2:32
A note: "[I]t's more computationally efficient to use names[0] etc, because you don't create a new object in memory for the tuple" is based on a faulty premise. (names[0], names[1], names[2]) is building a tuple just as much as tuple(names) does. The former is a tuple literal, the latter is a call to the tuple constructor, but they both make "a new object in memory for the tuple" (or not; tuple makes extensive use of free lists, so it often just pulls an available tuple off the free list for reuse). — ShadowRanger
– ShadowRanger, Commented Dec 12, 2020 at 3:37

ShadowRanger · Accepted Answer · 2020-12-12 02:23:18Z

6

CPython optimization rules are usually based around how much work you push to the C layer (vs. the bytecode interpreter) and how complex the bytecode instructions are; for low levels of absolute work, the fixed overhead of the interpreter tends to swamp the real work, so intuition derived from experience in lower-level languages just doesn't apply.

It's pretty easy to test though, especially with ipython's %timeit magic (timings done on Python 3.8.5 on Alpine Linux running under WSLv2):

In [2]: %%timeit l = [1, 2, 3]
   ...: tuple(l)
97.6 ns ± 0.303 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [3]: %%timeit l = [1, 2, 3]
   ...: (l[0], l[1], l[2])
104 ns ± 0.561 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [4]: %%timeit l = [1, 2, 3]
   ...: (*l,)
78.1 ns ± 0.628 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [5]: %%timeit l = [1, 2]
   ...: tuple(l)
96 ns ± 0.895 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [6]: %%timeit l = [1, 2]
   ...: (l[0], l[1])
70.1 ns ± 0.571 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [7]: %%timeit l = [1, 2]
   ...: (*l,)
73.4 ns ± 0.736 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

So in fact, the code example you gave made the correct decision for each size (assuming performance is all that counts); at two elements, indexing is faster than the alternatives, at three, converting to tuple in bulk saves enough over repeated indexing to win.

Just for fun, I included an equivalent solution to tuple(l) up there that using the additional unpacking generalizations to build the tuple using dedicated bytecodes, which shows how something as small as replacing a generalized constructor call with dedicated optimized bytecode can make a surprisingly amount of difference in the fixed overhead.

What's extra fun about this example: The faster (*l,) solution actually involves two temporaries; BUILD_TUPLE_UNPACK (the byte code that implements it) shares a code path with BUILD_LIST_UNPACK. Both of them actually build a list, and BUILD_TUPLE_UNPACK just converts it to tuple at the end. So (*l,) is hiding yet another copy to a temporary data structure, but because the specialized bytecode is so much more efficient than built-in lookup plus general purpose constructor code paths, it still wins.

answered Dec 12, 2020 at 2:23

ShadowRanger

158k12 gold badges222 silver badges317 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ShadowRanger Over a year ago

FYI, BUILD_TUPLE_UNPACK as a bytecode ends with 3.8; in 3.9, they went with a different approach to the unpacking generalizations. But the behavior of having it build a list first, and just convert to tuple at the end remains the same (there's a dedicated LIST_TO_TUPLE bytecode for that precise purpose). But in 3.9, (*l,) doesn't seem to win anymore (at least on a hand-compiled build that may be more efficient that the apk package Alpine installed for 3.8.5); both of them run significantly faster (~40 ns), but tuple(l) wins by 0.5-1.5 ns. Go figure.

trish_s Over a year ago

oh wow, thank you! I didn't even know about the (*l,) option, and it hadn't really occurred to me that the answer might be different for 2 vs 3 elements.

ShadowRanger Over a year ago

@trish_s: I knew this general pattern already (I play with %%timeit magic to procrastinate, and for funsies). Indexing is one of the things in Python where the overhead relative to real work is unusually high (it's one of many reasons to discourage for i in range(len(someseq)): someseq[i] over the more Pythonic for x in someseq: x), so I knew adding one element could absolutely make the difference.

kojiro · Accepted Answer · 2020-12-12 02:26:01Z

Let's use the disassembler to see what bytecode Python generates for this:

>>> names=['alex', 'ramon', 'carla']
>>> from dis import dis
>>> dis('abc')
  1           0 LOAD_NAME                0 (abc)
              2 RETURN_VALUE
>>> dis('"%s, %s and %s like this" % tuple(names)')
  1           0 LOAD_CONST               0 ('%s, %s and %s like this')
              2 LOAD_NAME                0 (tuple)
              4 LOAD_NAME                1 (names)
              6 CALL_FUNCTION            1
              8 BINARY_MODULO
             10 RETURN_VALUE
>>> dis('"%s, %s and %s like this" % (names[0], names[1], names[2])')
  1           0 LOAD_CONST               0 ('%s, %s and %s like this')
              2 LOAD_NAME                0 (names)
              4 LOAD_CONST               1 (0)
              6 BINARY_SUBSCR
              8 LOAD_NAME                0 (names)
             10 LOAD_CONST               2 (1)
             12 BINARY_SUBSCR
             14 LOAD_NAME                0 (names)
             16 LOAD_CONST               3 (2)
             18 BINARY_SUBSCR
             20 BUILD_TUPLE              3
             22 BINARY_MODULO
             24 RETURN_VALUE

The fact that the disassembler shows there are a lot more instructions for the second approach doesn't necessarily mean it's slower. After all, a function call is just the opaque CALL_FUNCTION. So you have to use judgement and know what that's doing. But it seems you're building a tuple either way…

theCoder · Accepted Answer · 2020-12-12 02:32:15Z

0

Take a look of your code visualization (shows memory units used by your code)

In your code, the type you stored in output_string is a string. Even though you wrote tuple(names) in memory there will be no memory allocation to a tuple. Consider the above visualization of your code in memory terms.

answered Dec 12, 2020 at 2:32

theCoder

345 bronze badges

1 Comment

ShadowRanger Over a year ago

I don't know where you get the idea there is no memory allocation. The tuple is constructed, used as the right operand to % to build the string, then discarded; it's still made. It might not actually allocate memory to do it (I see nothing that says that in what you posted, but you're coincidentally correct about this), because CPython's tuple maintains a free list for the small tuple sizes that it uses to minimize allocation requests, but it's still taking something unused and making it used, if only for a few nanoseconds.

Collectives™ on Stack Overflow

Python list elements vs converting to tuple for string formatting

3 Answers 3

3 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related