4

I want to calculate collatz sequence for some very big number but i guess python is failing to handle this much big numbers and i don't know how to make it handle so.

Here's my program:

def collatz(n):
    if (n == -1 or n == 1 or n == -17 or n == -17 -2**4096):
        print('break found',n)
        return
    if str(n)[-1] in ['1','3','5','7','9']:
        #print(n)
        return collatz(3*n + 1)
    else:
        return collatz(n//2)

I want to use n = 2**4096 ranged numbers. I increased recursion limit by sys.setrecursionlimit function. But now i'm facing Segmentation fault error.

>>> sys.setrecursionlimit(10**9)
>>> collatz(2**1000 + 1)
break found: 1
>>> collatz(2**4000 + 1)
Segmentation fault (core dumped)

Please give me some suggestions regarding whatever i need to modify in order to achieve big input support..

2
  • don't use recursion (you don't have a stack that's going to be large enough). Also, you have one branch that doesn't return a value... Commented May 29, 2021 at 18:32
  • Also, instead of str(n)[-1] in ['1','3','5','7','9'] use n % 2 == 1 - converting large integers to string can be slow (although 2**4096 isn't especially large. Commented May 29, 2021 at 18:34

2 Answers 2

2

Make it non-recursive, too deep recursion overflows stack and stack is usually just few megabytes. After stack overflow your program crashes with segmentation fault.

Your code modified to become non-recursive (which doesn't crash):

Try it online!

def collatz(n):
    while True:
        if (n == -1 or n == 1 or n == -17 or n == -17 -2**4096):
            print('break found', n)
            return
        if str(n)[-1] in ['1','3','5','7','9']:
            #print(n)
            n = 3 * n + 1
        else:
            n = n // 2


collatz(2**4000 + 1)

Output:

break found 1

BTW, classical Collatz problem can be solved with much shorter and faster code, for example like this:

Try it online!

def collatz(n):
    for i in range(1 << 50):
        if n == 1:
            return i
        n = 3 * n + 1 if n & 1 else n >> 1

print('Collatz chain length:', collatz(2**4000 + 1))

Output:

Collatz chain length: 29400

Also just for a side note I want to mention Python library GMPY2 based on famous C-based GMP. It has very optimized long integer arithmetics code and can be used to boost your code if you realy need speed.

On Windows gmpy2 can be installed by downloading it from here and installing through pip install gmpy2‑2.0.8‑cp39‑cp39‑win_amd64.whl. On Linux it can be installed through sudo apt install python3-gmpy2.

After installation you can use gmpy2 in a very simple manner, like in function collatz_gmpy() below:

Try it online!

def collatz_py(n):
    for i in range(1 << 50):
        if n == 1:
            return i
        n = 3 * n + 1 if n & 1 else n >> 1

def collatz_gmpy(n):
    from gmpy2 import mpz
    n = mpz(n)
    for i in range(1 << 50):
        if n == 1:
            return i
        n = 3 * n + 1 if n & 1 else n >> 1

def test():
    import timeit
    n = 2 ** 100000 + 1
    for i, f in enumerate([collatz_py, collatz_gmpy]):
        print(f.__name__, round(timeit.timeit(lambda: f(n), number = 1), 3), 'secs')

test()

Output:

collatz_py 7.477 secs
collatz_gmpy 2.916 secs

As one can see GMPY2 variant gives 2.56x times speedup compared to regular Python's variant.

Sign up to request clarification or add additional context in comments.

17 Comments

I don't know anything about the math here... why are the two answers different (is the OP algorithm incorrect)?
FWIW, the second version runs in 1.518 secs on my pc, when calculating collatz(2**40000-1) -- although the result is different.
@thebjorn Answers are different because I output a different thing in my second code. In my second code I solve a classical Collatz problem just for example, as it is usually solved by many people. In my second code I return number of loop iterations, this number is equal to length of Collatz chain, because algorithm produces always a chain, with length different for different inputs. OP's algorithm just outputs 1 always,it outputs not the length of chain but very last element of chain, and very last is always 1 (all other negative values like -17 never happen)
@Arty what does for i in range(1 << 50): do? Is it for getting a very big limit?
@Vicrobot Yes, 1<<50 is just a big limit, it is a limit for chain length, such long chains never appear in practice. Of cause you can use infinity here in this place, but having infinity range needs import itertools, I just wanted to simplify things by random big limit.
|
1

I would write it as:

def collatz(n):
    tmp = -17 - 2**4096            # precompute constant value outside loop
    while 1:                       # in general, iteration is better than recursion in Python for these kinds of functions
        if n in (-1, 1, -17, tmp): # less typing than lots of or clauses
            return n               # return values rather than printing them
        elif n % 2 == 1:           # faster than converting to string and checking last digit
            n = 3*n + 1
        else:
            n //= 2

call it with e.g.:

print(collatz(2**4000 + 1))

performance: on my PC, collatz(2**5000-1) with the above code takes 0.069 secs. Changing the code to elif str(n)[-1] in '13579': makes it take 1.606 secs, i.e. 23x slower.

On collatz(2**40000-1), the code above used 3.822 secs. Changing

elif n % 2 == 1:

to either

elif n % 2:

or

elif n & 1:

reduced the time to 1.988 secs (i.e. no difference).

Changing n //= 2 to n >>= 1 reduced the time further to 1.506 secs.

5 Comments

BTW, not sure if Python does division optimizations, so it is possible to write n % 2 as n & 1 and n //= 2 as n >>= 1.
Python does not :-)
So then to make code really optimal better to write n % 2 as n & 1 and n //= 2 as n >>= 1. Just a note, no need to modify your code.
@Arty I've added performance comparisons. The str removal had the most effect, but all the other suggestions had measurable performance benefits, at least on the single number I tested with ;-)
Another tiny optimization: 3n+1 is even => next step will be //2, you can combine those two steps into one (3n+1)//2 i.e. n+(n+1)//2. Or simply omit the else: to do the division every time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.