12

This very short and simple code in #Python tries to simulate the "Sieve of Eratosthenes" for the first N natural numbers with the constraints of (0) script shortness; (1) minimization of the 'if statements' and 'for/while loops'; (2) efficiency in terms of CPU time.

import numpy as np
N = 10**5
a = np.array(range(3,N,2))
for j in range(0, int(round(np.sqrt(N),0))):
    a[(a!=a[j]) & (a%a[j] == 0)] = 0
    a = a[a!=0]
a = [2]+list(a)

On an Intel Core I5, it returns the prime numbers among the first:

  • N = 100,000 in 0.03 seconds;
  • N = 1,000,000 in 0.63 seconds;
  • N = 10,000,000 in 22.2 seconds.

Would someone like to share more efficient codes in term of CPU time within the aforementioned constraints?

11
  • 6
    Like so many people before you, you've attempted to write a sieve and ended up with trial division. Commented Apr 20, 2018 at 7:30
  • 1
    @Roberto the idea of a sieve is to avoid divisions, which your code does not. (A real sieve could have even shorter code without any comparisons.) Commented Apr 20, 2018 at 7:50
  • 1
    Yeah, I just wanted to point out that numpy isn't a great idea here for performance. And as kazemakase points out, this isn't really The Sieve, since you iteratively check the modulus. Furthermore, (a!=a[j]) & (a%a[j] == 0) is a pretty wasteful operation if you really want to squeeze performance out of it. Commented Apr 20, 2018 at 7:51
  • 2
    There are a lot of NumPy examples with benchmarks here: stackoverflow.com/questions/2068372/… Commented Apr 20, 2018 at 8:14
  • 1
    See in particular the answers here: stackoverflow.com/a/3035188/3923281 Commented Apr 20, 2018 at 8:17

7 Answers 7

19

An actual NumPy sieve of Eratosthenes looks like this:

def sieve(n):
    flags = numpy.ones(n, dtype=bool)
    flags[0] = flags[1] = False
    for i in range(2, n):
        # We could use a lower upper bound for this loop, but I don't want to bother with
        # getting the rounding right on the sqrt handling.
        if flags[i]:
            flags[i*i::i] = False
    return numpy.flatnonzero(flags)

It maintains an array of "possibly prime" flags and directly unsets the flags corresponding to multiples of primes, without needing to test divisibility, especially for numbers that aren't divisible by the prime currently being handled.

What you're doing is trial division, where you just go through and test whether numbers are divisible by candidate divisors. Even a good implementation of trial division needs to do more operations, and more expensive operations, than a sieve. Your implementation does even more work than that, because it considers non-prime candidate divisors, and because it keeps performing divisibility tests for numbers it should already know are prime.

Sign up to request clarification or add additional context in comments.

7 Comments

Isn't it sufficient to run the outer loop only up to sqrt(len(flags))?
This works great! Thanks! N = 1E7 in 1.3 seconds and N = 1E8 in 13.1 seconds.
@kazemakase: Yeah, but I didn't want to bother with ensuring the rounding is correct.
@user2357112 fair enough :) (The correct rounding would be floor(sqrt(n)) + 1, but I agree that this would obscure the sleek example.)
@BeniCherniavsky-Paskin: Smaller multiples of i must have a prime factor smaller than i, so they must have already been handled.
|
3

I decided to play with this and created a further optimized NumPy version posted by @user2357112 supports Monica that uses the Numba JIT to speed things up.

import numba
import numpy
import timeit
import datetime 

@numba.jit(nopython = True, parallel = True, fastmath = True, forceobj = False)
def sieve (n: int) -> numpy.ndarray:
    primes = numpy.full(n, True)
    primes[0], primes[1] = False, False
    for i in numba.prange(2, int(numpy.sqrt(n) + 1)):
        if primes[i]:
            primes[i*i::i] = False
    return numpy.flatnonzero(primes)

if __name__ == "__main__":
    
    timestart = timeit.default_timer()
    print(sieve(1000000000))
    timestop = timeit.default_timer()
    timedelta = (timestop - timestart)
    print(f"Time Elapsed: {datetime.timedelta(seconds = timedelta)}")

else:
    pass

On my laptop, I sieve out the primes in the first 1 billion (1e9) natural numbers in 0:00:10.378686 seconds. The JIT provides at least an order of magnitude of performance here; the next fastest answer at the time of writing took 0:01:27.059963 minutes. Sadly, I don't have an Nvidia GPU and Cuda on this system (Mac), otherwise I'd have used that.

5 Comments

Nvidia Tesla V100 -> 11.798239 sec
What size input? What JIT/parameters did you use?
I used the same code (in your answer). But for some reason it didn't show up on nvidia-smi nor did it use both the GPUs. Is there any way we can run this on multiple GPUs?
@DevashishDas I would look to the Numba documentation, they have a specific decorator to leverage CUDA.
Really thank you very much. With my algorithm it was taking around 4 days for getting the results. Your algorithm took only 3 min. Really thank you, thank you, very much.
2

Here's a simple way to do this using NumPy. It is somewhat similar to the OP's idea of indexing instead of looping again inside the main loop, but without the division check, only slicing.

It is also similar to user2357112 supports Monica answer, but this one only considers odd numbers, which makes it faster.

This is a typical odd only sieve: https://stackoverflow.com/a/20248491/8094047.

  1. Create a bool array of size n.
  2. Loop through odd numbers up to square root of n.
    Note: numbers are used interchangeably as indices
  3. Mark composite (multiples of primes) numbers as false.

In the end we have a bool array that we can use to check if a number below n is prime (except even numbers, you can check for them without the array, using & 1 or so)

Avg. time for n = 20000000: 0.1063s

import numpy as np
n = 20000000

isprime = np.ones(n, dtype=np.bool)

# odd only sieve
i = 3
while (i * i < n):
    if isprime[i]:
        isprime[i * i:n:2 * i] = False
    i += 2

# test
print(isprime[97]) # should be True

2 Comments

maybe include some explanation instead of an all code answer
Very nice. I like the slicing, as opposed to a loop. Using odds only saves time.
2

1.94 seconds for 10.000.000

def sieve_eratosthene(limit):

    primes = [True] * (limit+1)
    iter = 0

    while iter < limit**0.5 :
        if iter < 2:
            primes[iter]= False

        elif primes[iter]:
            for i in range(iter*2, limit+1, iter):
                primes[i] = False

        iter+=1

    return(x for x in range(limit+1) if primes[x])

2 Comments

Should note there's a typo there, with the number variable, should be limit in the comprehension
@JustinMalloy Thanks a lot!
2

1 Billion primes in about 4 seconds

I liked stopping at sqrt(n) for selecting multiples to clear. I liked using slicing views in NumPy because to clear all multiples of 3 we just need:

p[9::3] = 0  # clear 9,12,15...

I was using numpy because np.sum() and slicing is several times faster than vanilla Python.
I had used uint8 because a normal python bool takes up 4 bytes instead of 1. For a sieve of 100 it doesn't matter. For 2 billion it adds up. I should've used a bit per prime. At Mark Dickerson's suggestion I changed the i*3 and h*3 to i*i and h*h Thanks very much. He also pointed out that my int8 array was the same size as a numpy dtype=bool array. So I switched to a bitarray() It works just like a numpy array at 1/8 of the space.

Now instead of 5 seconds for a billion wide sieve it takes about 4 seconds.

All prime numbers are in the 6k±1 range. 6(2)+1 is 13 6(2)-1 is 11. Not all numbers in that pattern are prime. All primes are in that pattern. So when clearing 1's I skip from 5 by 6 until I get to the sqrt(n) limit.

This uses Bitarray() and skip slicing:

from datetime import timedelta
import time,sys
from bitarray import bitarray

def get_primes(a_prime, start=0, end=100):
    """Return list of prime numbers in the range [start, end] using a_prime bitarray."""
    return [i for i in range(start, end + 1) if a_prime[i]]

def pretty_primes(a_prime, start=0, end=100):
    """Return list of prime numbers in the range [start, end] using a_prime bitarray."""
    return [f'{i:,}' for i in range(start, end + 1) if a_prime[i]]


def make_sieve(size):
    """Create a sieve of Eratosthenes up to the given size."""
    sieve_time = time.time()
    print(f"Creating new sieve of {size:,} primes.")
    limit = int(1 + size**0.5) + 2
    p = bitarray(size)  # One bit per value
    p.setall(True)
    p[0] = p[1] = False
    p[4::2] = False  # Clear multiples of 2
    p[9::3] = False  # Clear multiples of 3

    for i in range(5, limit, 6):  # Process only numbers of the form 6k-1 and 6k+1
        h = i + 2  # 6k+1
        if p[i]:  # If 6k-1 is prime
            p[i*i::2 * i] = False  # Clear multiples of 6k-1
        if p[h]:  # If 6k+1 is prime
            p[h*h::2 * h] = False  # Clear multiples of 6k+1
            
    print(f"Make sieve for {len(p):,} took {str(timedelta(seconds=time.time() - sieve_time))}")
    return p

sieve_size = 10**8
p = make_sieve(sieve_size)

# Input for counting primes up to n
n = int(input("Count primes to: >"))
start_time = time.time()

if n <= sieve_size:
    prime_count = p[1:n + 1].count()
else:
    p = make_sieve(n)
    prime_count = p[1:n + 1].count()
    sieve_size = len(p)
print(f"Sieve has a memory size of {sys.getsizeof(p):,} bytes") 
print(f"From 1 to {n:,} there are {prime_count:,} primes")
print(f"Search took {str(timedelta(seconds=time.time() - start_time))}")

# Get 200 primes from the sieve starting at 1,000,000
print(f"\nPrimes from 1,000,000 to 1,000,200 are:")
print(*pretty_primes(p, 10**6, 10**6 + 200), sep=', ')

# Test if a specific number is prime
test_p = int(input(f"\nEnter a number < {sieve_size:,} to test for prime> "))
print(f"{test_p:,} {'is prime' if p[test_p] else 'is not prime'}")

The output looks like:

 $ python prime_count_test.py 
Creating new sieve of 100,000,000 primes.
Make sieve for 100,000,000 took 0:00:00.284127
Count primes to: >1000000000    
Creating new sieve of 1,000,000,000 primes.
Make sieve for 1,000,000,000 took 0:00:04.124551
Sieve has a memory size of 125,000,080 bytes
From 1 to 1,000,000,000 there are 50,847,534 primes
Search took 0:00:04.246971

Primes from 1,000,000 to 1,000,200 are:
1,000,003, 1,000,033, 1,000,037, 1,000,039, 1,000,081, 1,000,099, 1,000,117, 1,000,121, 1,000,133, 1,000,151, 1,000,159, 1,000,171, 1,000,183, 1,000,187, 1,000,193, 1,000,199

Enter a number < 1,000,000,000 to test for prime> 123456789
123,456,789 is not prime

5 Comments

Why not i*i and h*h instead of 3*i and 3*h for the slice start indices?
"I used uint8 because a bool takes up 4 bytes instead of 1." This doesn't seem right. A NumPy array of dtype np.bool_ takes 1 byte per item. What does np.ones(100, dtype=bool).nbytes give for you?
Mark, Thanks so much I did the i times i changed it. I had been playing with i times 2, i times 3 and I forgot. I see you were right about the np.ones() array so I switched to a bitarray() 1/8 the size.
I reverted the latest changes. Maybe the code was a bit faster, but it was also broken, e.g. for primes < 100 it produced 2, 3, 5, 6, 7, 10, 11, 13, 14, 17, 18, 19, 22, 23, 26, 29, 30, 31, 34, 37, 38, 41, 42, 43, 46, 47, 50, 53, 54, 58, 59, 61, 62, 66, 67, 70, 71, 73, 74, 78, 79, 82, 83, 86, 89, 90, 94, 97, 98
stackoverflow.com/users/131160/jcsahnwaldt-reinstate-monica I just tried it and I get: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97
0

It is possible to reduce the execution time by using a wheel sieve. Respecting the constraints you imposed, it is possible to use a wheel size 6, where the results obtained are:

Primes count <  1000000000 :  50847534

Time:  2.2436816692352295 s

Code:

from math import isqrt
import numpy as np
import time
def wheel_sieve_6(n):
    count=0
    if n==3:
        count=1
    elif n>3:
        count=2
        Primes5mod6 =np.ones((n//6+1), dtype=bool)
        Primes1mod6 =np.ones((n//6+1), dtype=bool)
        Primes5mod6[0]=False
        Primes1mod6[0]=False
        imax=isqrt(n)//6+1
        for i in range(1,imax+1):
            if Primes5mod6[i]:
                Primes5mod6[6*i*i::6*i-1]=False
                Primes1mod6[6*i*i-2*i::6*i-1]=False
            if Primes1mod6[i]:
                Primes5mod6[6*i*i::6*i+1]=False
                Primes1mod6[6*i*i+2*i::6*i+1]=False
        count+=len(np.nonzero(Primes5mod6)[0][::])+len(np.nonzero(Primes1mod6)[0][::])
        if(n%6<=1 and Primes1mod6[n//6]):
            count -= 1
    return count
N=10**9
start = time.time()
count=wheel_sieve_6(N)
end = time.time()
print("\nPrimes count < ",N,": ", count)
print("\nTime: ", end-start, "s")

Using larger wheel sizes and segmented sieve instead the result is this if you just want to count the prime numbers

Primes count <  1000000000 :  50847534
    
Time:  1.8452579975128174 s

or this if you want to generate an array that contains the prime numbers

Primes count <  1000000000 :  50847534
    
Time:  3.634509325027466 s

Code:

# segmented wheel sieve python

import time
import numpy as np
from math import isqrt

def segmented_wheel_sieve(n , max_bW, primesValue = False):
    dim_seg = 2**19
    if N > 2* 10**8:
        dim_seg = 2**21
    bW = 30
    n_PB = 3
    n_PB_max = 5
    Primes_Base = np.array([2, 3, 5, 7, 11, 13])
    if max_bW > bW:
        while n_PB < n_PB_max and (bW * Primes_Base[n_PB] <= max_bW) and n > bW**2:
            bW *= Primes_Base[n_PB]
            n_PB += 1
        Remainder_t = np.ones(bW, dtype=bool)
        for i in range(n_PB):
            Remainder_t[Primes_Base[i]::Primes_Base[i]] = False
        RW = np.empty(0, int)
        for i in range(2, bW):
            if Remainder_t[i]:
                RW = np.append(RW, int(i))
        RW = np.append(RW, bW + 1)
    else:
        RW = np.array([7, 11, 13, 17, 19, 23, 29, 31])
    primes =  np.empty(0, int)
    count_p = 0
    if N > 1:
        for i in range(n_PB):
            if N >= Primes_Base[i]:
                if primesValue:
                    primes =  np.append(primes, Primes_Base[i])
                else:
                    count_p += 1
    if N > Primes_Base[n_PB - 1]:
        nR = len(RW)
        k_end = N // bW + 1
        if N % bW <= 1 and k_end > 0:
            k_end -= 1
        k_sqrt = isqrt(k_end // bW) + 2
        dim_seg_sqrt = max(k_sqrt, 2**8)
        k_sqrt_sqrt = isqrt(dim_seg_sqrt // bW) + 2
        C_t = np.ones((nR, nR), dtype = int)
        RW_i = np.zeros((nR, nR), dtype = int)
        C_t = np.multiply(np.dot(np.diag(RW), C_t), np.dot(C_t, np.diag(RW)))
        C_t = C_t % bW
        C_t[C_t == 1] = C_t[C_t == 1] + bW
        for i in range(nR):
            RW_i[i,:] = RW[np.argwhere(C_t == RW[i])[:,1]]
        C_t = np.dot(RW_i, np.diag(RW))
        C_t = C_t // bW
        C_t[-1,:] = C_t[-1,:] - 1 
        primeSqrt = np.ones((nR, dim_seg_sqrt), dtype = bool)
        for k in range(k_sqrt_sqrt):
            for i in range(nR):
                if primeSqrt[i, k]:
                    for i1 in range(nR):
                        m = bW * k * k + (RW[i] + RW_i[i1,i]) * k + C_t[i1,i]
                        primeSqrt[i1, m :: (bW * k + RW [i])] = False
        primes =  np.append(primes, RW[np.nonzero(primeSqrt.T)[1][::]] + bW * np.nonzero(primeSqrt.T)[0][::])
        if (k_end < dim_seg_sqrt):
            if primesValue:
                primes = primes[primes <= N]
            else:
                count_p = count_p + len(primes[primes <= N])
        else:
            if primesValue == False:
                count_p = count_p + len(primes)
            k_low = dim_seg_sqrt
            primeSeg = np.ones((nR, dim_seg), dtype = bool)
            while(k_low < k_end):
                k_seg_sqrt = min(dim_seg_sqrt, isqrt((k_low + dim_seg) // bW) + 2)
                for k in range(k_seg_sqrt):
                    for i in range(nR):
                        p = bW * k + RW [i]
                        if primeSqrt[i, k]:
                            for i1 in range(nR):
                                m = RW_i[i1,i] * k + C_t[i1,i]
                                if (m < k_low):
                                    m = p - ((k_low - m) % p);
                                else:
                                    m -= k_low
                                if (m == p):
                                    m = 0
                                if (m < dim_seg):
                                    primeSeg[i1, m :: p] = False
                if(k_low + dim_seg < k_end):
                    if primesValue:
                        primes = np.append(primes, RW[np.nonzero(primeSeg.T)[1][::]] + bW * (np.nonzero(primeSeg.T)[0][::] + k_low))
                    else:
                        count_p = count_p + len(np.nonzero(primeSeg.T)[0][::])
                else:
                    primes_t = np.array(RW[np.nonzero(primeSeg.T)[1][::]] + bW * (np.nonzero(primeSeg.T)[0][::] + k_low))
                    if primesValue:
                        primes = np.append(primes, primes_t[primes_t <= N])
                    else:
                        count_p = count_p + len(primes_t[primes_t <= N])
                primeSeg.fill(True)        
                k_low += dim_seg
    return primes if primesValue else count_p


N=10**9
genPrime = False #true setting if you want to generate an array with prime numbers
printPrime = False #if true and genPrime=true prints the prime numbers

start = time.time()

v_primes = segmented_wheel_sieve(N, 30, genPrime)
    
end = time.time()

time_s = end - start
if genPrime:
    if printPrime:
        for i in range(len(v_primes)):
            print(v_primes[i], ", ")
    print("\nPrimes count < ",N,": ", len(v_primes))
else:
    print("\nPrimes count < ",N,": ", v_primes)

print("\nTime: ", time_s, "s")

Comments

0

This is an implementation for primes in 1,000,000,000 numbers in under 2 seconds and possible numba enhancements which are commented out:


import numpy
from numba import njit, prange

@njit(parallel=True)
def parallel_sieve(limit):
    sieve_size = (limit // 2) + 1
    sieve_array = np.ones(sieve_size, dtype=np.bool_)
    sieve_array[0] = False

    for i in prange(1, int(limit ** 0.5) // 2 + 1):
        if sieve_array[i]:
            p = 2 * i + 1
            start_index = (p * p) // 2
            sieve_array[start_index::p] = False

    primes = [2]
    for i in range(1, sieve_size):
        if sieve_array[i]:
            primes.append(2 * i + 1)
    
    return np.array(primes)


limit = 1_000_000_000 #1 billion
a = parallel_sieve(limit)
%timeit parallel_sieve(limit)

# 3.77 s ± 543 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [13]: len(a)
Out[13]: 50847534

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.