Python: How to generate a random number not part of dataframe?

Question

I have a dataframe containing 15K+ strings in the format of xxxx-yyyyy-zzz. The yyyyy is a random 5 digit number generated. Given that I have xxxx as 1000 and zzz as 200, how can I generate the random yyyyy and add it to the dataframe so that the string is unique?

           number
0  1000-12345-100
1  1000-82045-200
2  1000-93035-200

import pandas as pd

data = {"number": ["1000-12345-100", "1000-82045-200", "1000-93035-200"]}
df = pd.DataFrame(data)
print(df)

I'd generate a list of values between 0 and 99999 and zfill them so they are always at length 5. Then generate the strings (f"1000-{random.choice(list_with_numbers)}-200") and remove that number from the list. — sandertjuh
– sandertjuh, Commented Jun 7, 2021 at 21:04

fsimonjetz · Accepted Answer · 2021-06-07 21:06:21Z

1

I'd generate a new column with just the middle values and generate random numbers until you find one that's not in the column.

from random import randint

df["excl"] = df.number.apply(lambda x:int(x.split("-")[1]))

num = randint(10000, 99999)

while num in df.excl.values:
    num = randint(10000, 99999)

answered Jun 7, 2021 at 21:06

fsimonjetz

5,7923 gold badges7 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Shantanu Over a year ago

This approach is simple, but traversing the list to check uniqueness will be inefficient.

Andreas · Accepted Answer · 2021-06-07 21:24:42Z

I tried to come up with a generic approach, you can use this for lists:

import random

number_series = ["1000-12345-100", "1000-82045-200", "1000-93035-200"]

def rnd_nums(n_numbers: int, number_series: list, max_length: int=5, prefix: int=1000, suffix: int=100):
    # ignore following numbers
    blacklist = [int(x.split('-')[1]) for x in number_series]
    # define space with allowed numbers
    rng = range(0, 10**max_length)
    # get unique sample of length "n_numbers"
    lst = random.sample([i for i in rng if i not in blacklist], n_numbers)
    # return sample as string with pre- and suffix
    return ['{}-{:05d}-{}'.format(prefix, mid, suffix) for mid in lst]

rnd_nums(5, number_series)

Out[69]: 
['1000-79396-100',
 '1000-30032-100',
 '1000-09188-100',
 '1000-18726-100',
 '1000-12139-100']

Or use it to generate new rows in a dataframe Dataframe:

import pandas as pd
data = {"number": ["1000-12345-100", "1000-82045-200", "1000-93035-200"]}
df = pd.DataFrame(data)
print(df)

df.append(pd.DataFrame({'number': rnd_nums(5, number_series)}), ignore_index=True)

Out[72]:
           number
0  1000-12345-100
1  1000-82045-200
2  1000-93035-200
3  1000-00439-100
4  1000-36284-100
5  1000-64592-100
6  1000-50471-100
7  1000-02005-100

Georgy Kopshteyn · Accepted Answer · 2021-06-07 21:58:15Z

In addition to the other suggestions, you could also write a function that takes your df and the amount of new numbers you would like to add as arguments, appends it with the new numbers and returns the updated df. The function could look like this:

import pandas as pd
import random

def add_number(df, num):
    lst = []
    for n in df["number"]:
        n = n.split("-")[1]
        lst.append(int(n))

    for i in range(num):
        check = False
        while check == False:
            new_number = random.randint(10000, 99999)
            if new_number not in lst:
                lst.append(new_number)
                l = len(df["number"])
                df.at[l+1,"number"] = "1000-%i-200" % new_number
                check = True

    df = df.reset_index(drop=True)
    return df

This would have the advantage that you could use the function every time you want to add new numbers.

99_m4n · Accepted Answer · 2021-06-07 21:24:06Z

0

try:

import random
df['number'] = [f"1000-{x}-200" for x in random.sample(range(10000, 99999), len(df))]

output:

           number
0  1000-24744-200
1  1000-28991-200
2  1000-98322-200
...

answered Jun 7, 2021 at 21:24

99_m4n

1,2655 silver badges18 bronze badges

Comments

Acccumulation · Accepted Answer · 2021-06-07 21:26:04Z

0

One option is to use sample from the random module:

import random
num_digits = 5
col_length = 15000
rand_nums = random.sample(range(10**num_digits),col_length)
data["number"]=['-'.join(
        '1000',str(num).zfill(num_digits),'200') 
    for num in rand_nums]

It took my computer about 30 ms to generate the numbers. For numbers with more digits, it may become infeasible.

Another option is to just take sequential integers, then encrypt them. This will result in a sequence in which each element is unique. They will be pseudo-random, rather than truly random, but then Python's random module is producing pseudo-random numbers as well.

edited Jun 7, 2021 at 21:26

answered Jun 7, 2021 at 21:21

Acccumulation

3,6311 gold badge11 silver badges13 bronze badges

Collectives™ on Stack Overflow

Python: How to generate a random number not part of dataframe?

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related