1

I have pandas dataframe, say df which looks like

Region  ID
A       111
A       222
A       333
A       444
B       555
B       666
B       777
C       888
C       999

ID column has its weights. In this case, A's weight is 2, B's weight is 2 and C's weight is 1.

weights never are more than the number of values in "Region" column, meaning A's weight never be more than 4 as we have 4 records for A

I want to make a new column and in this column assign random integer values according to weights in ID column BUT these random values have to be equally distributed. For more clarity, I expect new dataframe should look like this

Region  ID   Random_Value
A       111      1
A       222      2 
A       333      1
A       444      2
B       555      2
B       666      2
B       777      1
C       888      1
C       999      1

When the values in "Region" column is odd, like "B" I want to assign random values equally but the remainder can have any random integer value.

When the values in "Region" column is even, like "A" and its weight is 2 I need to assign random integer value from 1 to 2 inclusively and the number of these random integers should be equal.

I tried many ways but no success. Is there a way to solve this problem?

My code is the following:

df['Random_Value'] = np.nan

A = df['region'] == 'A'

df.loc[A, 'Random_Value'] = np.random.randint(1,3, size=A.sum())
5
  • could you edit your post to include what code you have tried? Commented Apr 9, 2019 at 8:22
  • Do you have the weight column on each region? Commented Apr 9, 2019 at 8:34
  • currently, I do not have but I'll be able to add weight column to main dataframe on each region Commented Apr 9, 2019 at 8:36
  • If A's weight is 2, it means random values have to be only [1, 2] right? Commented Apr 9, 2019 at 9:19
  • Yes right. If weight is 3 random values should be [1,2,3] Commented Apr 9, 2019 at 9:21

2 Answers 2

1

Suppose you have the dictionary store each region weight.

weight_dict = {'A':2, 'B':2, 'C':1}

I used.

  1. groupy then loop over it to get each group from dataframe.
  2. np.range to generate the possible weight from weight_dict.
  3. np.repeat to generate values for random.
  4. np.random.choice with replace=False to get the value without replacement.

Then create the new column with np.concatenate to combine list.

ls = []

for idx, d in df.groupby('Region'):

    group_size = d.shape[0]

    weight_range = np.arange(1, weight_dict[idx]+1)

    combination = np.repeat(weight_range, np.ceil(group_size/len(weight_range)))

    ls.append(np.random.choice(combination, group_size, replace=False))

df['Random_Value'] = np.concatenate(ls)

df

  Region   ID  Random_Value
0      A  111             2
1      A  222             1
2      A  333             1
3      A  444             2
4      B  555             1
5      B  666             2
6      B  777             2
7      C  888             1
8      C  999             1

You can try to print each variable to see what happened in the loop.

Sign up to request clarification or add additional context in comments.

Comments

0

Other than attempting to generate random number, you can do this by creating the needed random value list and trying to select the index randomly.

eg:-

>>> a=[1,1,2,2]
>>> numpy.random.choice(4, 4, replace=False)
array([0, 3, 2, 1])

According the generated random index, you can assign the values.

For odd numbers you can generate the random list as follow.

>>> np.random.randint(1,3,size=3)
array([1, 1, 2])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.