Assign random values equally to pandas dataframe

Question

I have pandas dataframe, say df which looks like

Region  ID
A       111
A       222
A       333
A       444
B       555
B       666
B       777
C       888
C       999

ID column has its weights. In this case, A's weight is 2, B's weight is 2 and C's weight is 1.

weights never are more than the number of values in "Region" column, meaning A's weight never be more than 4 as we have 4 records for A

I want to make a new column and in this column assign random integer values according to weights in ID column BUT these random values have to be equally distributed. For more clarity, I expect new dataframe should look like this

Region  ID   Random_Value
A       111      1
A       222      2 
A       333      1
A       444      2
B       555      2
B       666      2
B       777      1
C       888      1
C       999      1

When the values in "Region" column is odd, like "B" I want to assign random values equally but the remainder can have any random integer value.

When the values in "Region" column is even, like "A" and its weight is 2 I need to assign random integer value from 1 to 2 inclusively and the number of these random integers should be equal.

I tried many ways but no success. Is there a way to solve this problem?

My code is the following:

df['Random_Value'] = np.nan

A = df['region'] == 'A'

df.loc[A, 'Random_Value'] = np.random.randint(1,3, size=A.sum())

could you edit your post to include what code you have tried? — DrBwts
– DrBwts, Commented Apr 9, 2019 at 8:22
currently, I do not have but I'll be able to add weight column to main dataframe on each region — Okroshiashvili
– Okroshiashvili, Commented Apr 9, 2019 at 8:36
If A's weight is 2, it means random values have to be only [1, 2] right? — ResidentSleeper
– ResidentSleeper, Commented Apr 9, 2019 at 9:19

ResidentSleeper · Accepted Answer · 2019-04-09 09:37:36Z

Suppose you have the dictionary store each region weight.

weight_dict = {'A':2, 'B':2, 'C':1}

I used.

groupy then loop over it to get each group from dataframe.
np.range to generate the possible weight from weight_dict.
np.repeat to generate values for random.
np.random.choice with replace=False to get the value without replacement.

Then create the new column with np.concatenate to combine list.

ls = []

for idx, d in df.groupby('Region'):

    group_size = d.shape[0]

    weight_range = np.arange(1, weight_dict[idx]+1)

    combination = np.repeat(weight_range, np.ceil(group_size/len(weight_range)))

    ls.append(np.random.choice(combination, group_size, replace=False))

df['Random_Value'] = np.concatenate(ls)

df

  Region   ID  Random_Value
0      A  111             2
1      A  222             1
2      A  333             1
3      A  444             2
4      B  555             1
5      B  666             2
6      B  777             2
7      C  888             1
8      C  999             1

You can try to print each variable to see what happened in the loop.

Shehan Ishanka · Accepted Answer · 2019-04-09 08:24:51Z

0

Other than attempting to generate random number, you can do this by creating the needed random value list and trying to select the index randomly.

eg:-

>>> a=[1,1,2,2]
>>> numpy.random.choice(4, 4, replace=False)
array([0, 3, 2, 1])

According the generated random index, you can assign the values.

For odd numbers you can generate the random list as follow.

>>> np.random.randint(1,3,size=3)
array([1, 1, 2])

edited Apr 9, 2019 at 8:24

answered Apr 9, 2019 at 7:57

Shehan Ishanka

5934 silver badges5 bronze badges

Collectives™ on Stack Overflow

Assign random values equally to pandas dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related