Revisions to Project Euler 50 in Python

replaced http://codereview.stackexchange.com/ with https://codereview.stackexchange.com/

Source Link

edited Apr 13, 2017 at 12:40

Community Bot

1

The runtime is now dominated by the call to pyprimes.primes: this takes more than 90% of the time, so that's the place to put effort. Using sieve5 from this answer this answer instead of pyprimes speeds things up by another four times, and using sieve8 (which requires NumPy) gives a 5× speedup on top of that, solving the Project Euler problem in 27 milliseconds:
```
 >>> timeit(lambda:longest_prime_sum(1000000), number=1)
 0.026997262961231172
```

The runtime is now dominated by the call to pyprimes.primes: this takes more than 90% of the time, so that's the place to put effort. Using sieve5 from this answer instead of pyprimes speeds things up by another four times, and using sieve8 (which requires NumPy) gives a 5× speedup on top of that, solving the Project Euler problem in 27 milliseconds:
```
 >>> timeit(lambda:longest_prime_sum(1000000), number=1)
 0.026997262961231172
```

The runtime is now dominated by the call to pyprimes.primes: this takes more than 90% of the time, so that's the place to put effort. Using sieve5 from this answer instead of pyprimes speeds things up by another four times, and using sieve8 (which requires NumPy) gives a 5× speedup on top of that, solving the Project Euler problem in 27 milliseconds:
```
 >>> timeit(lambda:longest_prime_sum(1000000), number=1)
 0.026997262961231172
```

further optimizations

Source Link

edited May 3, 2015 at 13:00

Gareth Rees

50.1k
3
130
211

The runtime is now dominated by the call to pyprimes.primes: this takes more than 90% of the time, so that's the place to put effort. Using sieve5 from this answer instead of pyprimes speeds things up by another four times, and using sieve8 (which requires NumPy) gives a 5× speedup on top of that, solving the Project Euler problem in 27 milliseconds:
```
 >>> timeit(lambda:longest_prime_sum(1000000), number=1)
 0.026997262961231172
```

The runtime is now dominated by the call to pyprimes.primes: this takes more than 90% of the time, so that's the place to put effort. Using sieve5 from this answer instead of pyprimes speeds things up by another four times, and using sieve8 (which requires NumPy) gives a 5× speedup on top of that, solving the Project Euler problem in 27 milliseconds:
```
 >>> timeit(lambda:longest_prime_sum(1000000), number=1)
 0.026997262961231172
```

off by one errors

Source Link

edited May 3, 2015 at 12:00

Gareth Rees

50.1k
3
130
211

Starting the search with sequences of length 6 means that the code can't be tested on very small instances. For example, if we reduce the limit from 10000 to 10, main still prints 6, even though the longest consecutive prime sum in this case is 2 + 3 = 5.

I understand why you've done this: it's because you know from the example in the Project Euler problem statement (2 + 3 + 5 + 7 + 11 + 13 = 41) that when the limit is greater than 41, the longest consecutive prime sum must have length at least 6, so you can save time by not trying sums of length 1 to 5.

But by implementing this optimization, you've made your code give incorrect results for small limits, and that makes it hard to check for correctness. It would be better to make it correct in all circumstances, and then figure out a general way to avoid doing unnecessary work (see §2.4 below).

Python ranges are exclusive at the top, so this loop will never consider summing all the primes in the list:
```
 for x in range(6,len(prime_list)):
```

You might say that we never need to sum all the primes in the list, because they can't sum to another prime in the list, but that's false if the list only has one prime! Correctness on small examples is important for testing, so the loop needs to be:

    for x in range(1, len(prime_list) + 1):

Similarly, this loop never considers summing the last l primes in the list:
```
 for x in range(0,len(ls)-l):
```

It needs to be:

    for x in range(0, len(ls) - l + 1):

Applying all the changes in §1 above results in this code:

 from pyprimes import is_prime, primes

 def longest_prime_sum(limit):
     """Return the prime below limit that can be written as as the sum of
     the most consecutive primes.

     >>> longest_prime_sum(100)
     41

     """
     prime_list = list(primes(end=limit))
     result = None
     for l in range(61, len(prime_list) + 1):
         for x in range(len(prime_list) - l + 1):
             if prime_list[x] > limit / l:
                 break
             test_sum = sum(prime_list[x:x+l])
             if test_sum < limit and is_prime(test_sum):
                 result = test_sum
     return result

    >>> timeit(lambda:longest_prime_sum(40000), number=1)
    1.698296285001561983643549028784

    >>> timeit(lambda:longest_prime_sum(40000), number=1)
    1.45965695206541575118930450407788

Instead of exiting the loop when prime_list[x] > limit / l, it would be better to exit when test_sum >= limit. This is a tighter condition, so exits earlier. The code is now about 150more than 100 times faster than the original:
```
 >>> timeit(lambda:longest_prime_sum(40000), number=1)
 0.2559533910825848635101655102334917
```
Since the goal is to produce the longest sum, why do we start with small sums and work our way up? If we started with big sums and worked our way down, then as soon as we found a sum we could stop, knowing immediately that it was the longest.

    def longest_prime_sum(limit):
        """Return the prime below limit that can be written as as the sum of
        the most consecutive primes.

        >>> longest_prime_sum(100)
        41

        """
        prime_list = list(primes(end=limit))
        prime_set = set(prime_list)

        # Find max number of consecutive primes whose sum is below limit.
        total = 0
        for max_length, p in enumerate(prime_list):
            total += p
            if total >= limit:
                break

        for length in range(max_length + 1, 0, -1):
            for x in range(len(prime_list) - length + 1):
                total = sum(prime_list[x:x+length])
                if total >= limit:
                    break
                elif total in prime_set:
                    return total

Applying all the changes in §1 above results in this code:

 from pyprimes import is_prime, primes

 def longest_prime_sum(limit):
     """Return the prime below limit that can be written as as the sum of
     the most consecutive primes.

     >>> longest_prime_sum(100)
     41

     """
     prime_list = list(primes(end=limit))
     result = None
     for l in range(6, len(prime_list)):
         for x in range(len(prime_list) - l):
             if prime_list[x] > limit / l:
                 break
             test_sum = sum(prime_list[x:x+l])
             if test_sum < limit and is_prime(test_sum):
                 result = test_sum
     return result

    >>> timeit(lambda:longest_prime_sum(40000), number=1)
    1.698296285001561

    >>> timeit(lambda:longest_prime_sum(40000), number=1)
    1.4596569520654157

Instead of exiting the loop when prime_list[x] > limit / l, it would be better to exit when test_sum >= limit. This is a tighter condition, so exits earlier. The code is now about 150 times faster than the original:
```
 >>> timeit(lambda:longest_prime_sum(40000), number=1)
 0.25595339108258486
```
Since the goal is to produce the longest sum, why do we start with small sums and work our way up? If we started with big sums and worked our way down, then as soon as we found a sum we could stop, knowing immediately that it was the longest.

    def longest_prime_sum(limit):
        """Return the prime below limit that can be written as as the sum of
        the most consecutive primes.

        >>> longest_prime_sum(100)
        41

        """
        prime_list = list(primes(end=limit))
        prime_set = set(prime_list)

        # Find max number of consecutive primes whose sum is below limit.
        total = 0
        for max_length, p in enumerate(prime_list):
            total += p
            if total >= limit:
                break

        for length in range(max_length + 1, 0, -1):
            for x in range(len(prime_list) - length):
                total = sum(prime_list[x:x+length])
                if total >= limit:
                    break
                elif total in prime_set:
                    return total

Starting the search with sequences of length 6 means that the code can't be tested on very small instances. For example, if we reduce the limit from 10000 to 10, main still prints 6, even though the longest consecutive prime sum in this case is 2 + 3 = 5.

I understand why you've done this: it's because you know from the example in the Project Euler problem statement (2 + 3 + 5 + 7 + 11 + 13 = 41) that when the limit is greater than 41, the longest consecutive prime sum must have length at least 6, so you can save time by not trying sums of length 1 to 5.

But by implementing this optimization, you've made your code give incorrect results for small limits, and that makes it hard to check for correctness. It would be better to make it correct in all circumstances, and then figure out a general way to avoid doing unnecessary work (see §2.4 below).

Python ranges are exclusive at the top, so this loop will never consider summing all the primes in the list:
```
 for x in range(6,len(prime_list)):
```

You might say that we never need to sum all the primes in the list, because they can't sum to another prime in the list, but that's false if the list only has one prime! Correctness on small examples is important for testing, so the loop needs to be:

    for x in range(1, len(prime_list) + 1):

Similarly, this loop never considers summing the last l primes in the list:
```
 for x in range(0,len(ls)-l):
```

It needs to be:

    for x in range(0, len(ls) - l + 1):

Applying all the changes in §1 above results in this code:

 from pyprimes import is_prime, primes

 def longest_prime_sum(limit):
     """Return the prime below limit that can be written as as the sum of
     the most consecutive primes.

     >>> longest_prime_sum(100)
     41

     """
     prime_list = list(primes(end=limit))
     result = None
     for l in range(1, len(prime_list) + 1):
         for x in range(len(prime_list) - l + 1):
             if prime_list[x] > limit / l:
                 break
             test_sum = sum(prime_list[x:x+l])
             if test_sum < limit and is_prime(test_sum):
                 result = test_sum
     return result

    >>> timeit(lambda:longest_prime_sum(40000), number=1)
    1.983643549028784

    >>> timeit(lambda:longest_prime_sum(40000), number=1)
    1.5118930450407788

Instead of exiting the loop when prime_list[x] > limit / l, it would be better to exit when test_sum >= limit. This is a tighter condition, so exits earlier. The code is now more than 100 times faster than the original:
```
 >>> timeit(lambda:longest_prime_sum(40000), number=1)
 0.35101655102334917
```
Since the goal is to produce the longest sum, why do we start with small sums and work our way up? If we started with big sums and worked our way down, then as soon as we found a sum we could stop, knowing immediately that it was the longest.

    def longest_prime_sum(limit):
        """Return the prime below limit that can be written as as the sum of
        the most consecutive primes.

        >>> longest_prime_sum(100)
        41

        """
        prime_list = list(primes(end=limit))
        prime_set = set(prime_list)

        # Find max number of consecutive primes whose sum is below limit.
        total = 0
        for max_length, p in enumerate(prime_list):
            total += p
            if total >= limit:
                break

        for length in range(max_length + 1, 0, -1):
            for x in range(len(prime_list) - length + 1):
                total = sum(prime_list[x:x+length])
                if total >= limit:
                    break
                elif total in prime_set:
                    return total

Source Link

answered May 3, 2015 at 10:09

Gareth Rees

50.1k
3
130
211

Loading

Stack Exchange Network

Return to Answer