how to handle cpu cache in python ( or fastest way to call a function once)

Question

I find out that python's VM seems to be very unfriendly with CPU-Cache.

for example:

import time
a = range(500)

sum(a)

for i in range(1000000): #just to create a time interval, seems this disturb cpu cache?
    pass


st = time.time()
sum(a)
print (time.time() - st)*1e6

time:

> 100us

another case:

import time
a = range(500)

for i in range(100000):
    st = time.time()
    sum(a)
    print (time.time() - st)*1e6

time:

~ 20us

we can see when running frequently, the code becomes much faster.

is there a solution?

we can refer to this question:(should be the same problem) python function(or a code block) runs much slower with a time interval in a loop

I feel this question is very difficult. one must has indepth unstanding about the mechanism of python virtual machine, c and cpu-cache.

Do you have any suggestion about where to post this question for a possible answer?

@BrenBarn a solution to make code run faster in the first case. — Yuzhi Xu
– Yuzhi Xu, Commented Aug 23, 2015 at 6:00
You have just create a list of 1000000 integer object ! Sure it's disturb CPU cache ! It's not a problem of python VM... You need to provide a more useful exemple... — Cld
– Cld, Commented Aug 23, 2015 at 9:47

PM 2Ring · Accepted Answer · 2015-08-23 10:23:52Z

5

Your hypothesis "we can see when running frequently, the code becomes much faster" is not quite correct.

This loop

#just to create a time interval, seems this disturb cpu cache?
for i in range(1000000): 
    pass

isn't merely a simple time delay. In Python 2 it creates a list of 1,000,000 integer objects, loops over it, and then destroys it. In Python 3, range() is a generator, so it's much less wasteful of memory; the Python 2 equivalent is xrange().

In the standard CPython implementation, a list is an array of pointers to the objects it contains. On a 32 bit system, an integer object occupies 12 bytes, an empty list object occupies 32 bytes, and of course a pointer occupies 4 bytes. So range(1000000) consumes a total of 32 + 1000000 * (4 + 12) = 16000032 bytes.

Once the loop finishes, the reference count on the range() list drops to zero so it and its contents are eligible for garbage collection. It takes a little bit of time to garbage collect a million integers, so it's not surprising that you notice a time delay after the loop is finished. Also, the process of creating & deallocating all those objects will generally cause some memory fragmentation, and that will impact the efficiency of subsequent code.

FWIW, on my 2GHz Pentium 4 Linux system, your 1st code block prints values in the range 38 - 41µs, but if I change the loop to use xrange() then the time drops to 31 - 34µs.

If the code is modified to prevent garbage collection of the range() until after sum(a) is calculated, then it makes little difference in the timing whether we use range() or xrange().

import time

a = range(500)
sum(a)

bigrange = xrange(1000000)

for i in bigrange:
    pass

st = time.time()
sum(a)
print (time.time() - st) * 1e6

#Maintain a reference to bigrange to prevent it 
#being garbage collected earlier
print sum(bigrange)

edited Aug 23, 2015 at 10:23

answered Aug 23, 2015 at 10:08

PM 2Ring

55.6k6 gold badges96 silver badges202 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jfs Over a year ago

It is not about the list. I see the same behavior on Python 2/3, Pypy, Jython.

PM 2Ring Over a year ago

@J.F.Sebastian: Ah, ok. But it appears to me that the list garbage collection has some impact. At least, it obscures the effect that the OP is trying to measure. :)

R Nanthak Over a year ago

@jfs did you try with sleep method in python to delay time instead of looping

jfs Over a year ago

range() in Python 3 does not return a list (so no list to garbage collect). Can't answer on the loop vs. sleep: forgot the subtleties involved in the benchmark--it doesn't claim more than that the behavior appears to be the same between different Python versions/implementations despite the drastic difference in range() impl.

Collectives™ on Stack Overflow

how to handle cpu cache in python ( or fastest way to call a function once)

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related