0

I have a data frame with n rows, I want to assign a class to every row randomly from m classes such that the proportion of all classes are the same.

Example:

>>> classes = ['c1','c2','c3','c4']
>>> df = pd.DataFrame(np.random.randn(100, 5), columns = list("abcde"))
>>> df
           a         b         c         d         e
0  -0.341559  1.499159  0.269614 -0.198663 -1.081290
1  -1.966477  1.902292 -0.092296 -1.730710 -1.342866
2   1.188634 -2.851902  1.130480 -0.495677 -0.569557
3  -0.816190  1.205463  1.157507 -0.217025 -0.160752
4  -2.001114 -0.818852 -0.696057 -0.874615 -0.577101
..       ...       ...       ...       ...       ...
95  0.502192  0.434275  0.358244 -0.763562 -0.787102
96 -1.071011  0.045387  0.297905 -0.120974  0.185418
97  2.458274 -1.852953 -0.049336 -0.150604 -0.292824
98  1.992513 -0.431639  0.566920 -1.289439  0.626914
99  0.685915 -0.723009 -0.168497  1.630057  1.587378

[100 rows x 5 columns]

Expected output:

>>> df
           a         b         c         d         e class
0  -0.341559  1.499159  0.269614 -0.198663 -1.081290    c3
1  -1.966477  1.902292 -0.092296 -1.730710 -1.342866    c4
2   1.188634 -2.851902  1.130480 -0.495677 -0.569557    c2
3  -0.816190  1.205463  1.157507 -0.217025 -0.160752    c3
4  -2.001114 -0.818852 -0.696057 -0.874615 -0.577101    c1
..       ...       ...       ...       ...       ...   ...
95  0.502192  0.434275  0.358244 -0.763562 -0.787102    c1
96 -1.071011  0.045387  0.297905 -0.120974  0.185418    c3
97  2.458274 -1.852953 -0.049336 -0.150604 -0.292824    c2
98  1.992513 -0.431639  0.566920 -1.289439  0.626914    c1
99  0.685915 -0.723009 -0.168497  1.630057  1.587378    c2

[100 rows x 6 columns]

With the class proportions being the same

4
  • Did you try stackoverflow.com/q/65982695/7631183? Commented Jun 22, 2021 at 11:26
  • @Wanderer yes, that is how the above output was created but this does not ensure equal class proportions. Commented Jun 22, 2021 at 11:29
  • 1
    @dathbaba actually it does, if you set 'weights' equally then probability of selecting them will be the same = equal class proportions Commented Jun 22, 2021 at 11:31
  • Nope the class distribution is not same, df.groupby('class').size() confirms it. equal class proportions means there are 25 rows of each class Commented Jun 22, 2021 at 14:42

1 Answer 1

0

This should do the job

classes = ['c1','c2','c3','c4']
df = pd.DataFrame(np.random.randn(100, 5), columns = list("abcde"))

classes = np.repeat(classes, df.shape[0]/len(classes))
np.random.shuffle(classes)
df['class'] = classes
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.