1

I find out that python's VM seems to be very unfriendly with CPU-Cache.

for example:

import time
a = range(500)

sum(a)

for i in range(1000000): #just to create a time interval, seems this disturb cpu cache?
    pass


st = time.time()
sum(a)
print (time.time() - st)*1e6

time:

> 100us

another case:

import time
a = range(500)

for i in range(100000):
    st = time.time()
    sum(a)
    print (time.time() - st)*1e6

time:

~ 20us

we can see when running frequently, the code becomes much faster.

is there a solution?

we can refer to this question:(should be the same problem) python function(or a code block) runs much slower with a time interval in a loop

I feel this question is very difficult. one must has indepth unstanding about the mechanism of python virtual machine, c and cpu-cache.

Do you have any suggestion about where to post this question for a possible answer?

3
  • A solution to what? What do you want to achieve? Commented Aug 23, 2015 at 5:57
  • @BrenBarn a solution to make code run faster in the first case. Commented Aug 23, 2015 at 6:00
  • 1
    You have just create a list of 1000000 integer object ! Sure it's disturb CPU cache ! It's not a problem of python VM... You need to provide a more useful exemple... Commented Aug 23, 2015 at 9:47

1 Answer 1

5

Your hypothesis "we can see when running frequently, the code becomes much faster" is not quite correct.

This loop

#just to create a time interval, seems this disturb cpu cache?
for i in range(1000000): 
    pass

isn't merely a simple time delay. In Python 2 it creates a list of 1,000,000 integer objects, loops over it, and then destroys it. In Python 3, range() is a generator, so it's much less wasteful of memory; the Python 2 equivalent is xrange().

In the standard CPython implementation, a list is an array of pointers to the objects it contains. On a 32 bit system, an integer object occupies 12 bytes, an empty list object occupies 32 bytes, and of course a pointer occupies 4 bytes. So range(1000000) consumes a total of 32 + 1000000 * (4 + 12) = 16000032 bytes.

Once the loop finishes, the reference count on the range() list drops to zero so it and its contents are eligible for garbage collection. It takes a little bit of time to garbage collect a million integers, so it's not surprising that you notice a time delay after the loop is finished. Also, the process of creating & deallocating all those objects will generally cause some memory fragmentation, and that will impact the efficiency of subsequent code.

FWIW, on my 2GHz Pentium 4 Linux system, your 1st code block prints values in the range 38 - 41µs, but if I change the loop to use xrange() then the time drops to 31 - 34µs.

If the code is modified to prevent garbage collection of the range() until after sum(a) is calculated, then it makes little difference in the timing whether we use range() or xrange().

import time

a = range(500)
sum(a)

bigrange = xrange(1000000)

for i in bigrange:
    pass

st = time.time()
sum(a)
print (time.time() - st) * 1e6

#Maintain a reference to bigrange to prevent it 
#being garbage collected earlier
print sum(bigrange)
Sign up to request clarification or add additional context in comments.

4 Comments

@J.F.Sebastian: Ah, ok. But it appears to me that the list garbage collection has some impact. At least, it obscures the effect that the OP is trying to measure. :)
@jfs did you try with sleep method in python to delay time instead of looping
range() in Python 3 does not return a list (so no list to garbage collect). Can't answer on the loop vs. sleep: forgot the subtleties involved in the benchmark--it doesn't claim more than that the behavior appears to be the same between different Python versions/implementations despite the drastic difference in range() impl.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.