An Efficient Sieve of Eratosthenes in Python

Question

This very short and simple code in #Python tries to simulate the "Sieve of Eratosthenes" for the first N natural numbers with the constraints of (0) script shortness; (1) minimization of the 'if statements' and 'for/while loops'; (2) efficiency in terms of CPU time.

import numpy as np
N = 10**5
a = np.array(range(3,N,2))
for j in range(0, int(round(np.sqrt(N),0))):
    a[(a!=a[j]) & (a%a[j] == 0)] = 0
    a = a[a!=0]
a = [2]+list(a)

On an Intel Core I5, it returns the prime numbers among the first:

N = 100,000 in 0.03 seconds;
N = 1,000,000 in 0.63 seconds;
N = 10,000,000 in 22.2 seconds.

Would someone like to share more efficient codes in term of CPU time within the aforementioned constraints?

Like so many people before you, you've attempted to write a sieve and ended up with trial division. — user2357112
– user2357112, Commented Apr 20, 2018 at 7:30
@Roberto the idea of a sieve is to avoid divisions, which your code does not. (A real sieve could have even shorter code without any comparisons.) — MB-F
– MB-F, Commented Apr 20, 2018 at 7:50
Yeah, I just wanted to point out that numpy isn't a great idea here for performance. And as kazemakase points out, this isn't really The Sieve, since you iteratively check the modulus. Furthermore, (a!=a[j]) & (a%a[j] == 0) is a pretty wasteful operation if you really want to squeeze performance out of it. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Apr 20, 2018 at 7:51
There are a lot of NumPy examples with benchmarks here: stackoverflow.com/questions/2068372/… — Alex Riley
– Alex Riley, Commented Apr 20, 2018 at 8:14
See in particular the answers here: stackoverflow.com/a/3035188/3923281 — Alex Riley
– Alex Riley, Commented Apr 20, 2018 at 8:17

user2357112 · Accepted Answer · 2018-04-20 08:08:53Z

19

An actual NumPy sieve of Eratosthenes looks like this:

def sieve(n):
    flags = numpy.ones(n, dtype=bool)
    flags[0] = flags[1] = False
    for i in range(2, n):
        # We could use a lower upper bound for this loop, but I don't want to bother with
        # getting the rounding right on the sqrt handling.
        if flags[i]:
            flags[i*i::i] = False
    return numpy.flatnonzero(flags)

It maintains an array of "possibly prime" flags and directly unsets the flags corresponding to multiples of primes, without needing to test divisibility, especially for numbers that aren't divisible by the prime currently being handled.

What you're doing is trial division, where you just go through and test whether numbers are divisible by candidate divisors. Even a good implementation of trial division needs to do more operations, and more expensive operations, than a sieve. Your implementation does even more work than that, because it considers non-prime candidate divisors, and because it keeps performing divisibility tests for numbers it should already know are prime.

edited Apr 20, 2018 at 8:08

answered Apr 20, 2018 at 8:03

user2357112

286k32 gold badges490 silver badges571 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

MB-F Over a year ago

Isn't it sufficient to run the outer loop only up to sqrt(len(flags))?

Rob Over a year ago

This works great! Thanks! N = 1E7 in 1.3 seconds and N = 1E8 in 13.1 seconds.

user2357112 Over a year ago

@kazemakase: Yeah, but I didn't want to bother with ensuring the rounding is correct.

MB-F Over a year ago

@user2357112 fair enough :) (The correct rounding would be floor(sqrt(n)) + 1, but I agree that this would obscure the sleek example.)

user2357112 Over a year ago

@BeniCherniavsky-Paskin: Smaller multiples of i must have a prime factor smaller than i, so they must have already been handled.

|

Greenstick · Accepted Answer · 2020-11-18 01:25:51Z

3

I decided to play with this and created a further optimized NumPy version posted by @user2357112 supports Monica that uses the Numba JIT to speed things up.

import numba
import numpy
import timeit
import datetime 

@numba.jit(nopython = True, parallel = True, fastmath = True, forceobj = False)
def sieve (n: int) -> numpy.ndarray:
    primes = numpy.full(n, True)
    primes[0], primes[1] = False, False
    for i in numba.prange(2, int(numpy.sqrt(n) + 1)):
        if primes[i]:
            primes[i*i::i] = False
    return numpy.flatnonzero(primes)

if __name__ == "__main__":
    
    timestart = timeit.default_timer()
    print(sieve(1000000000))
    timestop = timeit.default_timer()
    timedelta = (timestop - timestart)
    print(f"Time Elapsed: {datetime.timedelta(seconds = timedelta)}")

else:
    pass

On my laptop, I sieve out the primes in the first 1 billion (1e9) natural numbers in 0:00:10.378686 seconds. The JIT provides at least an order of magnitude of performance here; the next fastest answer at the time of writing took 0:01:27.059963 minutes. Sadly, I don't have an Nvidia GPU and Cuda on this system (Mac), otherwise I'd have used that.

answered Nov 18, 2020 at 1:25

Greenstick

9,1251 gold badge30 silver badges31 bronze badges

5 Comments

Devashish Das Over a year ago

Nvidia Tesla V100 -> 11.798239 sec

Greenstick Over a year ago

What size input? What JIT/parameters did you use?

Devashish Das Over a year ago

I used the same code (in your answer). But for some reason it didn't show up on nvidia-smi nor did it use both the GPUs. Is there any way we can run this on multiple GPUs?

Greenstick Over a year ago

@DevashishDas I would look to the Numba documentation, they have a specific decorator to leverage CUDA.

Srinadh Over a year ago

Really thank you very much. With my algorithm it was taking around 4 days for getting the results. Your algorithm took only 3 min. Really thank you, thank you, very much.

Tonechas · Accepted Answer · 2020-11-02 17:16:13Z

2

Here's a simple way to do this using NumPy. It is somewhat similar to the OP's idea of indexing instead of looping again inside the main loop, but without the division check, only slicing.

It is also similar to user2357112 supports Monica answer, but this one only considers odd numbers, which makes it faster.

This is a typical odd only sieve: https://stackoverflow.com/a/20248491/8094047.

Create a bool array of size n.
Loop through odd numbers up to square root of n.
Note: numbers are used interchangeably as indices
Mark composite (multiples of primes) numbers as false.

In the end we have a bool array that we can use to check if a number below n is prime (except even numbers, you can check for them without the array, using & 1 or so)

Avg. time for n = 20000000: 0.1063s

import numpy as np
n = 20000000

isprime = np.ones(n, dtype=np.bool)

# odd only sieve
i = 3
while (i * i < n):
    if isprime[i]:
        isprime[i * i:n:2 * i] = False
    i += 2

# test
print(isprime[97]) # should be True

edited Nov 2, 2020 at 17:16

Tonechas

13.8k16 gold badges52 silver badges85 bronze badges

answered Nov 1, 2020 at 23:59

Eyad Ahmed

1303 silver badges15 bronze badges

2 Comments

KetZoomer Over a year ago

maybe include some explanation instead of an all code answer

Andy Richter Over a year ago

Very nice. I like the slicing, as opposed to a loop. Using odds only saves time.

andreis11 · Accepted Answer · 2021-12-28 20:57:17Z

2

1.94 seconds for 10.000.000

def sieve_eratosthene(limit):

    primes = [True] * (limit+1)
    iter = 0

    while iter < limit**0.5 :
        if iter < 2:
            primes[iter]= False

        elif primes[iter]:
            for i in range(iter*2, limit+1, iter):
                primes[i] = False

        iter+=1

    return(x for x in range(limit+1) if primes[x])

edited Dec 28, 2021 at 20:57

answered Apr 1, 2020 at 2:47

andreis11

1,1611 gold badge6 silver badges11 bronze badges

2 Comments

Justin Malloy Over a year ago

Should note there's a typo there, with the number variable, should be limit in the comprehension

andreis11 Over a year ago

@JustinMalloy Thanks a lot!

jcsahnwaldt Reinstate Monica · Accepted Answer · 2024-09-24 12:50:19Z

2

1 Billion primes in about 4 seconds

I liked stopping at sqrt(n) for selecting multiples to clear. I liked using slicing views in NumPy because to clear all multiples of 3 we just need:

p[9::3] = 0  # clear 9,12,15...

I was using numpy because np.sum() and slicing is several times faster than vanilla Python.
I had used uint8 because a normal python bool takes up 4 bytes instead of 1. For a sieve of 100 it doesn't matter. For 2 billion it adds up. I should've used a bit per prime. At Mark Dickerson's suggestion I changed the i*3 and h*3 to i*i and h*h Thanks very much. He also pointed out that my int8 array was the same size as a numpy dtype=bool array. So I switched to a bitarray() It works just like a numpy array at 1/8 of the space.

Now instead of 5 seconds for a billion wide sieve it takes about 4 seconds.

All prime numbers are in the 6k±1 range. 6(2)+1 is 13 6(2)-1 is 11. Not all numbers in that pattern are prime. All primes are in that pattern. So when clearing 1's I skip from 5 by 6 until I get to the sqrt(n) limit.

This uses Bitarray() and skip slicing:

from datetime import timedelta
import time,sys
from bitarray import bitarray

def get_primes(a_prime, start=0, end=100):
    """Return list of prime numbers in the range [start, end] using a_prime bitarray."""
    return [i for i in range(start, end + 1) if a_prime[i]]

def pretty_primes(a_prime, start=0, end=100):
    """Return list of prime numbers in the range [start, end] using a_prime bitarray."""
    return [f'{i:,}' for i in range(start, end + 1) if a_prime[i]]


def make_sieve(size):
    """Create a sieve of Eratosthenes up to the given size."""
    sieve_time = time.time()
    print(f"Creating new sieve of {size:,} primes.")
    limit = int(1 + size**0.5) + 2
    p = bitarray(size)  # One bit per value
    p.setall(True)
    p[0] = p[1] = False
    p[4::2] = False  # Clear multiples of 2
    p[9::3] = False  # Clear multiples of 3

    for i in range(5, limit, 6):  # Process only numbers of the form 6k-1 and 6k+1
        h = i + 2  # 6k+1
        if p[i]:  # If 6k-1 is prime
            p[i*i::2 * i] = False  # Clear multiples of 6k-1
        if p[h]:  # If 6k+1 is prime
            p[h*h::2 * h] = False  # Clear multiples of 6k+1
            
    print(f"Make sieve for {len(p):,} took {str(timedelta(seconds=time.time() - sieve_time))}")
    return p

sieve_size = 10**8
p = make_sieve(sieve_size)

# Input for counting primes up to n
n = int(input("Count primes to: >"))
start_time = time.time()

if n <= sieve_size:
    prime_count = p[1:n + 1].count()
else:
    p = make_sieve(n)
    prime_count = p[1:n + 1].count()
    sieve_size = len(p)
print(f"Sieve has a memory size of {sys.getsizeof(p):,} bytes") 
print(f"From 1 to {n:,} there are {prime_count:,} primes")
print(f"Search took {str(timedelta(seconds=time.time() - start_time))}")

# Get 200 primes from the sieve starting at 1,000,000
print(f"\nPrimes from 1,000,000 to 1,000,200 are:")
print(*pretty_primes(p, 10**6, 10**6 + 200), sep=', ')

# Test if a specific number is prime
test_p = int(input(f"\nEnter a number < {sieve_size:,} to test for prime> "))
print(f"{test_p:,} {'is prime' if p[test_p] else 'is not prime'}")

The output looks like:

 $ python prime_count_test.py 
Creating new sieve of 100,000,000 primes.
Make sieve for 100,000,000 took 0:00:00.284127
Count primes to: >1000000000    
Creating new sieve of 1,000,000,000 primes.
Make sieve for 1,000,000,000 took 0:00:04.124551
Sieve has a memory size of 125,000,080 bytes
From 1 to 1,000,000,000 there are 50,847,534 primes
Search took 0:00:04.246971

Primes from 1,000,000 to 1,000,200 are:
1,000,003, 1,000,033, 1,000,037, 1,000,039, 1,000,081, 1,000,099, 1,000,117, 1,000,121, 1,000,133, 1,000,151, 1,000,159, 1,000,171, 1,000,183, 1,000,187, 1,000,193, 1,000,199

Enter a number < 1,000,000,000 to test for prime> 123456789
123,456,789 is not prime

edited Sep 24, 2024 at 12:50

jcsahnwaldt Reinstate Monica

3,6564 gold badges39 silver badges35 bronze badges

answered Jun 2, 2024 at 15:39

Andy Richter

3482 silver badges8 bronze badges

5 Comments

Mark Dickinson Over a year ago

Why not i*i and h*h instead of 3*i and 3*h for the slice start indices?

Mark Dickinson Over a year ago

"I used uint8 because a bool takes up 4 bytes instead of 1." This doesn't seem right. A NumPy array of dtype np.bool_ takes 1 byte per item. What does np.ones(100, dtype=bool).nbytes give for you?

Andy Richter Over a year ago

Mark, Thanks so much I did the i times i changed it. I had been playing with i times 2, i times 3 and I forgot. I see you were right about the np.ones() array so I switched to a bitarray() 1/8 the size.

jcsahnwaldt Reinstate Monica Over a year ago

I reverted the latest changes. Maybe the code was a bit faster, but it was also broken, e.g. for primes < 100 it produced

2, 3, 5, 6, 7, 10, 11, 13, 14, 17, 18, 19, 22, 23, 26, 29, 30, 31, 34, 37, 38, 41, 42, 43, 46, 47, 50, 53, 54, 58, 59, 61, 62, 66, 67, 70, 71, 73, 74, 78, 79, 82, 83, 86, 89, 90, 94, 97, 98

Andy Richter Over a year ago

stackoverflow.com/users/131160/jcsahnwaldt-reinstate-monica I just tried it and I get: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97

user140242 · Accepted Answer · 2024-10-19 18:19:51Z

It is possible to reduce the execution time by using a wheel sieve. Respecting the constraints you imposed, it is possible to use a wheel size 6, where the results obtained are:

Primes count <  1000000000 :  50847534

Time:  2.2436816692352295 s

Code:

from math import isqrt
import numpy as np
import time
def wheel_sieve_6(n):
    count=0
    if n==3:
        count=1
    elif n>3:
        count=2
        Primes5mod6 =np.ones((n//6+1), dtype=bool)
        Primes1mod6 =np.ones((n//6+1), dtype=bool)
        Primes5mod6[0]=False
        Primes1mod6[0]=False
        imax=isqrt(n)//6+1
        for i in range(1,imax+1):
            if Primes5mod6[i]:
                Primes5mod6[6*i*i::6*i-1]=False
                Primes1mod6[6*i*i-2*i::6*i-1]=False
            if Primes1mod6[i]:
                Primes5mod6[6*i*i::6*i+1]=False
                Primes1mod6[6*i*i+2*i::6*i+1]=False
        count+=len(np.nonzero(Primes5mod6)[0][::])+len(np.nonzero(Primes1mod6)[0][::])
        if(n%6<=1 and Primes1mod6[n//6]):
            count -= 1
    return count
N=10**9
start = time.time()
count=wheel_sieve_6(N)
end = time.time()
print("\nPrimes count < ",N,": ", count)
print("\nTime: ", end-start, "s")

Using larger wheel sizes and segmented sieve instead the result is this if you just want to count the prime numbers

Primes count <  1000000000 :  50847534
    
Time:  1.8452579975128174 s

or this if you want to generate an array that contains the prime numbers

Primes count <  1000000000 :  50847534
    
Time:  3.634509325027466 s

Code:

# segmented wheel sieve python

import time
import numpy as np
from math import isqrt

def segmented_wheel_sieve(n , max_bW, primesValue = False):
    dim_seg = 2**19
    if N > 2* 10**8:
        dim_seg = 2**21
    bW = 30
    n_PB = 3
    n_PB_max = 5
    Primes_Base = np.array([2, 3, 5, 7, 11, 13])
    if max_bW > bW:
        while n_PB < n_PB_max and (bW * Primes_Base[n_PB] <= max_bW) and n > bW**2:
            bW *= Primes_Base[n_PB]
            n_PB += 1
        Remainder_t = np.ones(bW, dtype=bool)
        for i in range(n_PB):
            Remainder_t[Primes_Base[i]::Primes_Base[i]] = False
        RW = np.empty(0, int)
        for i in range(2, bW):
            if Remainder_t[i]:
                RW = np.append(RW, int(i))
        RW = np.append(RW, bW + 1)
    else:
        RW = np.array([7, 11, 13, 17, 19, 23, 29, 31])
    primes =  np.empty(0, int)
    count_p = 0
    if N > 1:
        for i in range(n_PB):
            if N >= Primes_Base[i]:
                if primesValue:
                    primes =  np.append(primes, Primes_Base[i])
                else:
                    count_p += 1
    if N > Primes_Base[n_PB - 1]:
        nR = len(RW)
        k_end = N // bW + 1
        if N % bW <= 1 and k_end > 0:
            k_end -= 1
        k_sqrt = isqrt(k_end // bW) + 2
        dim_seg_sqrt = max(k_sqrt, 2**8)
        k_sqrt_sqrt = isqrt(dim_seg_sqrt // bW) + 2
        C_t = np.ones((nR, nR), dtype = int)
        RW_i = np.zeros((nR, nR), dtype = int)
        C_t = np.multiply(np.dot(np.diag(RW), C_t), np.dot(C_t, np.diag(RW)))
        C_t = C_t % bW
        C_t[C_t == 1] = C_t[C_t == 1] + bW
        for i in range(nR):
            RW_i[i,:] = RW[np.argwhere(C_t == RW[i])[:,1]]
        C_t = np.dot(RW_i, np.diag(RW))
        C_t = C_t // bW
        C_t[-1,:] = C_t[-1,:] - 1 
        primeSqrt = np.ones((nR, dim_seg_sqrt), dtype = bool)
        for k in range(k_sqrt_sqrt):
            for i in range(nR):
                if primeSqrt[i, k]:
                    for i1 in range(nR):
                        m = bW * k * k + (RW[i] + RW_i[i1,i]) * k + C_t[i1,i]
                        primeSqrt[i1, m :: (bW * k + RW [i])] = False
        primes =  np.append(primes, RW[np.nonzero(primeSqrt.T)[1][::]] + bW * np.nonzero(primeSqrt.T)[0][::])
        if (k_end < dim_seg_sqrt):
            if primesValue:
                primes = primes[primes <= N]
            else:
                count_p = count_p + len(primes[primes <= N])
        else:
            if primesValue == False:
                count_p = count_p + len(primes)
            k_low = dim_seg_sqrt
            primeSeg = np.ones((nR, dim_seg), dtype = bool)
            while(k_low < k_end):
                k_seg_sqrt = min(dim_seg_sqrt, isqrt((k_low + dim_seg) // bW) + 2)
                for k in range(k_seg_sqrt):
                    for i in range(nR):
                        p = bW * k + RW [i]
                        if primeSqrt[i, k]:
                            for i1 in range(nR):
                                m = RW_i[i1,i] * k + C_t[i1,i]
                                if (m < k_low):
                                    m = p - ((k_low - m) % p);
                                else:
                                    m -= k_low
                                if (m == p):
                                    m = 0
                                if (m < dim_seg):
                                    primeSeg[i1, m :: p] = False
                if(k_low + dim_seg < k_end):
                    if primesValue:
                        primes = np.append(primes, RW[np.nonzero(primeSeg.T)[1][::]] + bW * (np.nonzero(primeSeg.T)[0][::] + k_low))
                    else:
                        count_p = count_p + len(np.nonzero(primeSeg.T)[0][::])
                else:
                    primes_t = np.array(RW[np.nonzero(primeSeg.T)[1][::]] + bW * (np.nonzero(primeSeg.T)[0][::] + k_low))
                    if primesValue:
                        primes = np.append(primes, primes_t[primes_t <= N])
                    else:
                        count_p = count_p + len(primes_t[primes_t <= N])
                primeSeg.fill(True)        
                k_low += dim_seg
    return primes if primesValue else count_p


N=10**9
genPrime = False #true setting if you want to generate an array with prime numbers
printPrime = False #if true and genPrime=true prints the prime numbers

start = time.time()

v_primes = segmented_wheel_sieve(N, 30, genPrime)
    
end = time.time()

time_s = end - start
if genPrime:
    if printPrime:
        for i in range(len(v_primes)):
            print(v_primes[i], ", ")
    print("\nPrimes count < ",N,": ", len(v_primes))
else:
    print("\nPrimes count < ",N,": ", v_primes)

print("\nTime: ", time_s, "s")

oppressionslayer · Accepted Answer · 2024-10-20 07:55:34Z

This is an implementation for primes in 1,000,000,000 numbers in under 2 seconds and possible numba enhancements which are commented out:


import numpy
from numba import njit, prange

@njit(parallel=True)
def parallel_sieve(limit):
    sieve_size = (limit // 2) + 1
    sieve_array = np.ones(sieve_size, dtype=np.bool_)
    sieve_array[0] = False

    for i in prange(1, int(limit ** 0.5) // 2 + 1):
        if sieve_array[i]:
            p = 2 * i + 1
            start_index = (p * p) // 2
            sieve_array[start_index::p] = False

    primes = [2]
    for i in range(1, sieve_size):
        if sieve_array[i]:
            primes.append(2 * i + 1)
    
    return np.array(primes)


limit = 1_000_000_000 #1 billion
a = parallel_sieve(limit)
%timeit parallel_sieve(limit)

# 3.77 s ± 543 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [13]: len(a)
Out[13]: 50847534

Collectives™ on Stack Overflow

An Efficient Sieve of Eratosthenes in Python

7 Answers 7

7 Comments

5 Comments

2 Comments

2 Comments

1 Billion primes in about 4 seconds

5 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

7 Comments

5 Comments

2 Comments

2 Comments

1 Billion primes in about 4 seconds

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related