0

Can I somehow speed up this simple code in python? That takes 0.221seconds for executing

for some_list in some_list_of_lists:
    i = 1
    while i < len(some_list):
        a = some_list[i-1]
        b = some_list[i]
        
        i += 2
  • some_list_of_lists ~110 000 items
  • some_list ~10 items

Same code in cython takes only 0.004seconds, my goal is achieve 0.02seconds or less in python(without using cython)

full code:

result = []
for some_list in some_list_of_lists:
    t1 = 1
    t2 = 1
    
    i = 1
    while i < len(some_list):
        a = some_list[i-1]  #int
        b = some_list[i]    #int
        
        t1 *= a
        t2 *= b
        
        i += 2 
    if t1 > t2:
        result.append(some_list)
12
  • 4
    wiki.python.org/moin/PythonSpeed/PerformanceTips ---- look at that link for a bunch of tips on code efficiency and speed Commented Sep 16, 2022 at 22:45
  • 5
    This is a silly question - your code doesn't even do anything. It just nests two pointless loops, that read values pairwise from lists from a list, only to do nothing with them. Depending on what operation you need, there may be faster solutions - the fastest way to optimise the code you shared is to comment it out, and replace with a, b = some_list_of_lists[-1][-2:] (maybe some simple extra logic for odd inner length of the last list, or maybe you were after i as well). Please explain a bit more about the actual problem. Commented Sep 16, 2022 at 22:55
  • @Grismar I think this is just the skeleton of the loop, and it actually uses a and b within the loop. But that part isn't relevant. Commented Sep 16, 2022 at 23:16
  • @barmar, I think so too, but if performance is key, it's how a and b would be used / what they are needed for that would determine what the best performance improvement would be. Commented Sep 16, 2022 at 23:17
  • @Grismar I add more code in question, now you can see for what a and b used Commented Sep 16, 2022 at 23:27

2 Answers 2

2

The main problem here is that your implementation is very inefficient.

Compare:

from timeit import timeit
from random import random
import numpy as np  # for a numpy solution
from math import prod  # for a better Python solution

# just some random example data that is "some list of lists" (of numbers)
example_data = [
    [random() * 10 for _ in range(100)]
    for _ in range(100)
]


def f(some_list_of_lists):
    # your code, in a function, so it's easy to time
    result = []
    for some_list in some_list_of_lists:
        t1 = 1
        t2 = 1

        i = 1
        while i < len(some_list):
            a = some_list[i - 1]  # int
            b = some_list[i]  # int

            t1 *= a
            t2 *= b

            i += 2
        if t1 > t2:
            result.append(some_list)
    return result


# the same data again, but in a numpy array
example_array = np.array(example_data)


def numpy_f(some_array):
    # this function does exactly the same, it selects those 'lists' for which
    # the product of the even elements is greater than the product of the odd ones
    return some_array[
        np.prod(some_array[:,::2], axis=1) > np.prod(some_array[:,1::2], axis=1)
    ]


def better_python_f(some_list_of_lists):
    return [
        xs for xs in some_list_of_lists if prod(xs[::2]) > prod(xs[1::2])
    ]


# showing that the results are the same
simple_example = [[2, 1, 1, 1], [1, 2, 1, 1], [2, 1, .5, 1], [.5, 1, 1, 1], [1, .5, 1, 1]]
print(f(simple_example))
print(numpy_f(np.array(simple_example)))
print(better_python_f(simple_example))

# timing running each 10,000 times on the large example data set
print('yours:', timeit(lambda: f(example_data), number=10000))
print('numpy:', timeit(lambda: numpy_f(example_array), number=10000))
print('mine :', timeit(lambda: better_python_f(example_data), number=10000))

Output:

[[2, 1, 1, 1], [1, 0.5, 1, 1]]
[[2.  1.  1.  1. ]
 [1.  0.5 1.  1. ]]
[[2, 1, 1, 1], [1, 0.5, 1, 1]]
yours: 5.2597994999996445
numpy: 0.1987909999988915
mine : 0.7029579999998532

This shows that using just plain Python, without external libraries, you can easily get almost an order of magnitude faster. I'm sure someone can come up with an even faster method just in Python.

However, numpy is perfect for this, and even the trivial solution I wrote is more than 25x faster than yours - and again I think someone more experienced in the application of numpy can probably write an even faster solution using numpy.

The key takeaway here shouldn't be "I need to use numpy because it's fast" though. The key thing is that you should think about what problem you're actually solving and then decide what the most efficient way is to do exactly that.

In your case, what the code actually does is "for a series of number series, decide if the product of numbers on even indices is greater than the product of numbers on odd incides for each number series, and keep only those where this is the case". To me, that quickly points at thinking about slicing, efficient ways to compute a product, and how to select and return the desired result.

Not only does the better_python_f do exactly what it needs to, the code almost reads like a description of the problem:

[
    xs for xs in some_list_of_lists if prod(xs[::2]) > prod(xs[1::2])
]

"A list of all xs in the original list, where the product of even elements is greater than the product of odd elements."

Of course, it does require that you know about the Python concepts 'list comprehension' (x for x in .. if <condition on x>) and 'slicing' (e.g. xs[::2]), and that math has a fairly efficient prod() function to compute the product of elements of an iterable like a list.

Sign up to request clarification or add additional context in comments.

Comments

0

Threading and multiprocessing are both modules used to speed up functionality and get more done in less time. I'm not sure how much improvement you'll see in this example but these modules are great to know about in general.

Multiprocessing vs Threading Python

1 Comment

Pickling will certainly make the code using multiprocessing slower than the serial one and the multithreading package is useless here because of the GIL. Thus, it is not very useful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.