Sudden reduction in python iteration speed after 10 seconds

Question

I am writing some code in python to find out the first natural number which does not appear in the first billion digits of pi. Here's what I've written:

import datetime
from tqdm import tqdm
def findpi(end):
    notin = []
    stop = 0
    s1 = datetime.datetime.now()
    with open(r"C:\Users\shamm\Desktop\Text Documents\1 BILLION Digits of pi.txt", 'rt') as p:
        pi = str(p.read())
        for i in tqdm(range(1,end+1)):
            if str(i) not in pi:
                if notin == []:
                    stop = i
                notin.append(i)
    s2 = datetime.datetime.now()
    tdelta = s2 - s1
    ts = tdelta.total_seconds()
    return [notin, stop, ts]

pi = findpi(1000000)
print("Not in:", pi[0])
print("Last:", pi[1])
print("Time taken:", pi[2])

It works fine for small numbers, and when I tried it with the first million natural numbers, the code faces a sudden blockade. For the first 10 seconds, it runs at about 10k iterations/s, but then it suddenly goes down to 1k iterations/s. I tried it with a larger input of 10 million, and the same thing happens, running at 10k it/s only for 10 seconds. After 30+ minutes of running, it goes down to 100 it/s.

Is there a bottleneck which limits it from using too much memory or processing power? Or is something wrong in my code?

Edit: so it seems like the cause is the length of the ever-increasing length of the number which needs to be searched. How can I optimize this search so that it will not slow down with every increase in the number of digits?

Could you try to find at which value it starts slowing down? I wonder whether it could be related to the conversion of bigger numbers (more than 32 bits) or to the search of strings larger that 8 bytes... — Serge Ballesta
– Serge Ballesta, Commented Jan 11 at 11:10
@SergeBallesta it stops exactly at the 100 thousandth iteration. — PanC
– PanC, Commented Jan 11 at 12:04
Is there any way to optimize the process? To ensure that this exponential slowdown doesn't happen? — PanC
– PanC, Commented Jan 11 at 12:04
Can you tell how fast my solution is for you with your billion digits file? (I can't do that myself currently.) — no comment
– no comment, Commented Jan 11 at 12:47

no comment · Accepted Answer · 2025-01-11 12:47:30Z

0

That's expected. You slow down by factor 10 every time you reach a larger number of digits. Numbers with k+1 digits take about ten times longer to find than numbers with k digits. Because their first k digits are equally hard to find but then there's only a 10% chance that the extra digit also matches.

You could instead go through pi's digits, building the numbers it does contain, and record them. Then a second pass can quickly look up whether each number was seen:

    with open(r"...", 'rb') as p:
        pi = bytearray(p.read())
        pi = pi.translate(pi.maketrans(b'0123456789', bytes(range(10))))
        seen = bytearray(end+1)
        while pi:
            i = 0
            for d in pi:
                i = i * 10 + d
                if i > end:
                    break
                seen[i] = 1
            del pi[0]
        for i in tqdm(range(1,end+1)):
            if not seen[i]:
                if notin == []:
                    stop = i
                notin.append(i)

With 10 million digits, findpi(1000000) takes me about 6 seconds, so I estimate with a billion digits it will take about 10 minutes.

edited Jan 11 at 12:47

answered Jan 11 at 11:53

no comment

10.9k5 gold badges21 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

Ry- Jan 11 at 12:56

This solution sits at a bit of an odd position where it fundamentally relies on end being sufficiently low (which is fine, that’s necessarily true) but also doesn’t take advantage of that to avoid buffering 2GB in memory.

no comment Jan 11 at 13:02

@Ry- The 2GB for my pi? I considered optimizing that, but chose to keep it simple. They already use 1GB and probably have several more.

Ry- Jan 11 at 13:08

Yeah, but it could use just ~one page at a time, which I expect to be much faster (fits in cache, allows readahead during CPU work) but admittedly haven’t tried (and Python performance sometimes surprises me).

no comment Jan 11 at 13:10

@Ry- Well, it's not like I'm jumping around. I always just look at the first few digits. That means I pretty much am using just ~one page at a time.

Ry- Jan 11 at 13:11

Right, exactly – given that the core of the solution works like that, I just thought it was odd to read the entire file into memory and then do a translate pass over the whole thing into a new copy beforehand.

|

Collectives™ on Stack Overflow

Sudden reduction in python iteration speed after 10 seconds

1 Answer 1

13 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

13 Comments

Your Answer

Sign up or log in

Post as a guest

Related