0

I have been trying to optimize my code.

I compared 4 possible coding choices for getting the value in one cell of a list of list ( or replace list with array).

M = 1000
my_list = [[] for i in range(M)]
for i in range(M):
    for j in range(M):
        my_list[i].append(0)
my_numpy_list = [ np.full(M,1) for i in range(M) ]
time1 = time.time()
for j in range(1000):
    for i in range(10000):
        my_list[0][0]
print( "1  ", time.time() - time1)

time1 = time.time()
for j in range(1000):
    test_list = my_list[0]
    for i in range(10000):
        test_list[0]
print("2 ",time.time() - time1)

for j in range(1000):
    for i in range(10000):
        my_numpy_list[0][0]
print("3 ", time.time() - time1)


for j in range(1000):
    my_numpy_test_list = my_numpy_list[0]
    for i in range(10000):
        my_numpy_test_list[0]
print( "4  ", time.time() - time1)

on my computer, it gives the following times :

1   0.9008669853210449
2  0.7616724967956543
3  2.9174351692199707
4   4.883266925811768

The question is, why is it longer to access values in a numpy array ? If it's longer, what about converting an array into a list in order to access data faster. In particular, I am very surprised that storing the array which was in a list ( case 4) is the slowest case. Shoudln't the time be :

4 < 2 < 3 < 1 ?

Cheers

3
  • 1
    you have forgotten to reassign time1 = time.time() in the last two loops Commented May 30, 2020 at 16:00
  • I didn't downvote. Btw, I don't know if this is intentional, but my_numpy_list is a list of np.arrays and not an np.array Commented May 30, 2020 at 16:20
  • 1
    @CommissarVasiliKarlovic yes, the difficulty I have is that I am dealing with lists of different sizes. I discovered that a good idea could be having a list of arrays instead of a list of lists. This is why I am refractoring my code, and then I discovered it runned much slower... this is the reason why I asked for help. Your solution using 'map' is extremely effective though. Thank you. Commented May 30, 2020 at 16:26

1 Answer 1

1

Because the goal of numpy is not to make your access to data faster. Instead the goal of numpy is to allow you to write vectorized code and avoid loops.

Let's modify your example and make your code adding 1 to every element of your list/np.array

M = 1000
my_list = [[] for i in range(M)]
for i in range(M):
    for j in range(M):
        my_list[i].append(0)
my_numpy_array = np.array([ np.full(M,1) for i in range(M) ])
time1 = time.time()

time1 = time.time()
for j in range(1000):
    test_list = my_list[0]
    for i in range(10000):
        test_list[0]+1
print("list case addition",time.time() - time1)

time2 = time.time()
my_numpy_list = my_numpy_array+1
print("numpy case addition",time.time() - time2)

The output is:

list case addition 0.7961978912353516
numpy case addition 0.0031096935272216797

which is about 250 times faster

Sign up to request clarification or add additional context in comments.

5 Comments

Is there a way to access data faster than list of list ? I need to keep the notion of "list". However, i can store the lists I have anyhow.
Also, I discovered that creating numpy arrays is pretty slow. This could give some elements of answer
@Pleasedon'thitme map, filter, reduce functions are efficient to access your data if there's a pattern. In your case map(lambda x : x[0], my_numpy_list)
@CommissarVasiliKarlovic oh wonderful, I wouldn't have thought about using maps like that. Thank you for the tip !!! Actually, your solution is extremely effective, if you can write as a solution, I could accept it.
if speed is the problem a list comprehension will be faster than map for that

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.