1 Billion primes in about 4 seconds
I liked stopping at sqrt(n) for selecting multiples to clear.
I liked using slicing views in NumPy because to clear all multiples of 3 we just need:
p[9::3] = 0 # clear 9,12,15...
I was using numpy because np.sum() and slicing is several times faster than vanilla Python.
I had used uint8 because a normal python bool takes up 4 bytes instead of 1. For a sieve of 100 it doesn't matter. For 2 billion it adds up. I should've used a bit per prime. At Mark Dickerson's suggestion I changed the i*3 and h*3 to i*i and h*h Thanks very much. He also pointed out that my int8 array was the same size as a numpy dtype=bool array. So I switched to a bitarray() It works just like a numpy array at 1/8 of the space.
Now instead of 5 seconds for a billion wide sieve it takes about 4 seconds.
All prime numbers are in the 6k±1 range. 6(2)+1 is 13 6(2)-1 is 11. Not all numbers in that pattern are prime. All primes are in that pattern. So when clearing 1's I skip from 5 by 6 until I get to the sqrt(n) limit.
This uses Bitarray() and skip slicing:
from datetime import timedelta
import time,sys
from bitarray import bitarray
def get_primes(a_prime, start=0, end=100):
"""Return list of prime numbers in the range [start, end] using a_prime bitarray."""
return [i for i in range(start, end + 1) if a_prime[i]]
def pretty_primes(a_prime, start=0, end=100):
"""Return list of prime numbers in the range [start, end] using a_prime bitarray."""
return [f'{i:,}' for i in range(start, end + 1) if a_prime[i]]
def make_sieve(size):
"""Create a sieve of Eratosthenes up to the given size."""
sieve_time = time.time()
print(f"Creating new sieve of {size:,} primes.")
limit = int(1 + size**0.5) + 2
p = bitarray(size) # One bit per value
p.setall(True)
p[0] = p[1] = False
p[4::2] = False # Clear multiples of 2
p[9::3] = False # Clear multiples of 3
for i in range(5, limit, 6): # Process only numbers of the form 6k-1 and 6k+1
h = i + 2 # 6k+1
if p[i]: # If 6k-1 is prime
p[i*i::2 * i] = False # Clear multiples of 6k-1
if p[h]: # If 6k+1 is prime
p[h*h::2 * h] = False # Clear multiples of 6k+1
print(f"Make sieve for {len(p):,} took {str(timedelta(seconds=time.time() - sieve_time))}")
return p
sieve_size = 10**8
p = make_sieve(sieve_size)
# Input for counting primes up to n
n = int(input("Count primes to: >"))
start_time = time.time()
if n <= sieve_size:
prime_count = p[1:n + 1].count()
else:
p = make_sieve(n)
prime_count = p[1:n + 1].count()
sieve_size = len(p)
print(f"Sieve has a memory size of {sys.getsizeof(p):,} bytes")
print(f"From 1 to {n:,} there are {prime_count:,} primes")
print(f"Search took {str(timedelta(seconds=time.time() - start_time))}")
# Get 200 primes from the sieve starting at 1,000,000
print(f"\nPrimes from 1,000,000 to 1,000,200 are:")
print(*pretty_primes(p, 10**6, 10**6 + 200), sep=', ')
# Test if a specific number is prime
test_p = int(input(f"\nEnter a number < {sieve_size:,} to test for prime> "))
print(f"{test_p:,} {'is prime' if p[test_p] else 'is not prime'}")
The output looks like:
$ python prime_count_test.py
Creating new sieve of 100,000,000 primes.
Make sieve for 100,000,000 took 0:00:00.284127
Count primes to: >1000000000
Creating new sieve of 1,000,000,000 primes.
Make sieve for 1,000,000,000 took 0:00:04.124551
Sieve has a memory size of 125,000,080 bytes
From 1 to 1,000,000,000 there are 50,847,534 primes
Search took 0:00:04.246971
Primes from 1,000,000 to 1,000,200 are:
1,000,003, 1,000,033, 1,000,037, 1,000,039, 1,000,081, 1,000,099, 1,000,117, 1,000,121, 1,000,133, 1,000,151, 1,000,159, 1,000,171, 1,000,183, 1,000,187, 1,000,193, 1,000,199
Enter a number < 1,000,000,000 to test for prime> 123456789
123,456,789 is not prime
numpyisn't a great idea here for performance. And as kazemakase points out, this isn't really The Sieve, since you iteratively check the modulus. Furthermore,(a!=a[j]) & (a%a[j] == 0)is a pretty wasteful operation if you really want to squeeze performance out of it.