3

It's my first time posting here and I hope you guys can answer my question.

I'm writing a Python script that finds all of the prime numbers up to some number n using the sieve of eratosthenes. In order to make the program more efficient, I want to sieve the range of numbers in "segments" that fit inside the CPU's L1 cache. I know my own L1 cache, it's 576 KB. But I want this program to work on any computer.

Is there any way I can get the CPU's L1 cache? I want specifically L1 cache, not L2 or L3.

9
  • Even if you could, why do you think this will speed up your program in any meaningful way? Commented Apr 12, 2018 at 1:05
  • Do you really have 576KB of L1 cache? I'm pretty sure Intel chips are always powers of two, and only go up to 32K/32K, except for a few Xeons in a past generation that did 64K/32K before they realized it didn't speed up any real-life loads. Commented Apr 12, 2018 at 1:11
  • 1
    Assuming that you got the size of L1 cache, what will it tell you about the optimal size of the segment? You are not coding in assembly. It's an interpreted scripting language. It will have its own opinion about what belongs into the L1 cache. Setting the size of the segment to the size of the cache probably wouldn't help. On each system, you would have to try out multiple sizes of caches to understand what works best. But then you can start trying out different sizes directly, without looking up the size of the L1 cache in the first place. Commented Apr 12, 2018 at 1:12
  • I have an AMD Ryzen 5 1600, which has 576 KB of total L1 cache. Commented Apr 12, 2018 at 1:19
  • 1
    At any rate, if you want to optimize a Sieve of Eratosthenes, your first steps should be: (1) Make sure you actually have implemented the sieve, and not an equivalent of the much slower algorithm that got published in a Scheme textbook and has spread over the whole world ever since. (2) Use PyPy instead of CPython, or numpy, or possibly gmpy. (3) Profile the Python code and optimize the high-level Python stuff. (4) Profile it again and micro-optimize little things like global lookups … (42) Profile the interpreter while it's running your code and worry about the CPU cache. Commented Apr 12, 2018 at 1:26

1 Answer 1

5

Python is a "garbage collection" language. One such consequence of this is that memory is automatically allocated and freed as needed. This creates memory fragmentation which can break apart transfers to the CPU caches. It's also not possible to change the layout of a data structure directly in memory which means that one transfer on the bus might not contain all the relevant information for a computation — even though it might all fit within the bus width. It essentially hurts any prospects for keeping L1/L2 caches filled with the relevant data for the next computation.

Another problem comes from Python’s dynamic types and not being compiled. Many C developers generally realize at some point the compiler is usually smarter than they are. The compiler can perform many tricks to affect how things are laid out, how the CPU will run certain instructions, in what order, and the best way to optimize them. Python, however, is not compiled and, to make matters worse, has dynamic types which means that inferring any possible opportunities for optimizations with an algorithm is exponentially more difficult since code functionality can be changed during runtime.

As mentioned in the comments, there ways to mitigate such problems, foremost being CPython or the Cython extensions for Python — it allows Python code to be compiled.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.