0

Just thinking if there is a better way to do this.

Currently I have a working function that generates a 5-alphanumeric keycode based on a given index or number.

The problem is it takes too much time generating it. I'm expecting at least 30 Million records and I've tried running it for just a million records and it takes forever.

Does anyone may suggest how to make this code cleaner and faster? Thanks in advance.

import time


def generate_unique(index_id):
    BASE = 35;                                          # zero-based
    base36 = ['0','1','2','3','4','5','6','7','8','9',
              'a','b','c','d','e','f','g','h','i','j','k',
              'm','n','o','p','q','r','s','t','u','v','w','x','y','z']

    idx = [0, 0, 0, 0, 0]

    for i in range(0, index_id - 1):
        idx[4] = idx[4] + 1
        if idx[4] == BASE:
            idx[4] = 0
            idx[3] = idx[3] + 1
            if idx[3] == BASE:
                idx[3] = 0
                idx[2] = idx[2] + 1
                if idx[2] == BASE:
                    idx[2] = 0
                    idx[1] = idx[1]+1
                    if idx[1] == BASE:
                        idx[1] = 0
                        idx[0] = idx[0] + 1

    return base36[idx[0]] + base36[idx[1]] + base36[idx[2]] + base36[idx[3]] + base36[idx[4]]


t1 = time.process_time()
for i in range(1, 1000000):
    generate_unique(i)
t2 = time.process_time()
print(f"Process completed successfully in {t2 - t1} seconds.")
8
  • Does it need to be 5 digits? Does this help? stackoverflow.com/questions/1210458/… Commented Jun 15, 2020 at 16:59
  • Hi @Axe319, Yes, as much as possible all I wanted is a unique key that can accomodate at least 100Million of records. So I think a 5 alphanumeric Base36 character code is sufficient for that. Commented Jun 15, 2020 at 17:04
  • 1
    The reason I ask if it doesn't need to be 5 digits is because something like [str(uuid.uuid4()) for _ in range(1000000)] takes around 4 seconds to run on my machine and it gives you virtually infinite room for growth. The only caveat is it's a 36 char string. Commented Jun 15, 2020 at 17:12
  • I think as long as it produces a 5 digit unique code it is alright. Commented Jun 15, 2020 at 17:17
  • whats the use case, are you using them as a unique key for your data? Commented Jun 15, 2020 at 17:18

2 Answers 2

1

You can use numpy's base_repr:

import numpy as np   

f'{np.base_repr(index_id-1, 36).lower():0>5}'

You skipped the letter 'l' in your implementation: if you add it to base36 and set BASE = 36 this function will return the same result.

Timings:

%timeit generate_unique(1_000_000)
#169 ms ± 705 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit f'{np.base_repr(1_000_000-1, 36).lower():0>5}'
#2.67 µs ± 51.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

so base_repr is over 63000 times faster than the loop solution.

Sign up to request clarification or add additional context in comments.

Comments

0

you can try something like this:

codes = []
for m in range(35):
    for l in range(35):
        for k in range(35):
            for j in range(35):
                for i in range(35):
                    codes.append(base36[m]+base36[l]+base36[k]+base36[j]+base36[i])

It took less than 2 minutes in my computer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.