0

I am writing some code in python to find out the first natural number which does not appear in the first billion digits of pi. Here's what I've written:

import datetime
from tqdm import tqdm
def findpi(end):
    notin = []
    stop = 0
    s1 = datetime.datetime.now()
    with open(r"C:\Users\shamm\Desktop\Text Documents\1 BILLION Digits of pi.txt", 'rt') as p:
        pi = str(p.read())
        for i in tqdm(range(1,end+1)):
            if str(i) not in pi:
                if notin == []:
                    stop = i
                notin.append(i)
    s2 = datetime.datetime.now()
    tdelta = s2 - s1
    ts = tdelta.total_seconds()
    return [notin, stop, ts]

pi = findpi(1000000)
print("Not in:", pi[0])
print("Last:", pi[1])
print("Time taken:", pi[2])

It works fine for small numbers, and when I tried it with the first million natural numbers, the code faces a sudden blockade. For the first 10 seconds, it runs at about 10k iterations/s, but then it suddenly goes down to 1k iterations/s. I tried it with a larger input of 10 million, and the same thing happens, running at 10k it/s only for 10 seconds. After 30+ minutes of running, it goes down to 100 it/s.

Is there a bottleneck which limits it from using too much memory or processing power? Or is something wrong in my code?

Edit: so it seems like the cause is the length of the ever-increasing length of the number which needs to be searched. How can I optimize this search so that it will not slow down with every increase in the number of digits?

4
  • Could you try to find at which value it starts slowing down? I wonder whether it could be related to the conversion of bigger numbers (more than 32 bits) or to the search of strings larger that 8 bytes... Commented Jan 11 at 11:10
  • @SergeBallesta it stops exactly at the 100 thousandth iteration. Commented Jan 11 at 12:04
  • Is there any way to optimize the process? To ensure that this exponential slowdown doesn't happen? Commented Jan 11 at 12:04
  • Can you tell how fast my solution is for you with your billion digits file? (I can't do that myself currently.) Commented Jan 11 at 12:47

1 Answer 1

0

That's expected. You slow down by factor 10 every time you reach a larger number of digits. Numbers with k+1 digits take about ten times longer to find than numbers with k digits. Because their first k digits are equally hard to find but then there's only a 10% chance that the extra digit also matches.

You could instead go through pi's digits, building the numbers it does contain, and record them. Then a second pass can quickly look up whether each number was seen:

    with open(r"...", 'rb') as p:
        pi = bytearray(p.read())
        pi = pi.translate(pi.maketrans(b'0123456789', bytes(range(10))))
        seen = bytearray(end+1)
        while pi:
            i = 0
            for d in pi:
                i = i * 10 + d
                if i > end:
                    break
                seen[i] = 1
            del pi[0]
        for i in tqdm(range(1,end+1)):
            if not seen[i]:
                if notin == []:
                    stop = i
                notin.append(i)

With 10 million digits, findpi(1000000) takes me about 6 seconds, so I estimate with a billion digits it will take about 10 minutes.

Sign up to request clarification or add additional context in comments.

13 Comments

This solution sits at a bit of an odd position where it fundamentally relies on end being sufficiently low (which is fine, that’s necessarily true) but also doesn’t take advantage of that to avoid buffering 2GB in memory.
@Ry- The 2GB for my pi? I considered optimizing that, but chose to keep it simple. They already use 1GB and probably have several more.
Yeah, but it could use just ~one page at a time, which I expect to be much faster (fits in cache, allows readahead during CPU work) but admittedly haven’t tried (and Python performance sometimes surprises me).
@Ry- Well, it's not like I'm jumping around. I always just look at the first few digits. That means I pretty much am using just ~one page at a time.
Right, exactly – given that the core of the solution works like that, I just thought it was odd to read the entire file into memory and then do a translate pass over the whole thing into a new copy beforehand.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.