Unable to create pandas dataframe with particular number of class label

Question

Is it possible to create a random pandas dataframe with 1500 rows to have class label 0 and 500 rows to have class label as 1.

It should be like

feature_1   class_label

sdfdsfsdfd    0
kjdkfkjdsf    0
jkkjhjknn     1
dfsfgdsfd     0
gfdgdfsdd     1

The values of feature_1 column can be anything but it's 1500 values should have label 0 and 500 values should have label 1.

yatu · Accepted Answer · 2020-10-02 13:08:12Z

We can use numpy here, and draw random samples from a range of the length of the column using np.random.choice:

a = np.zeros(2000, dtype='int')
a[np.random.choice(range(len(a)), 500)] = 1
pd.Series(a).rename_axis('feature_1').reset_index(name='label')
      feature_1  label
0             0      0
1             1      0
2             2      0
3             3      0
4             4      0
...         ...    ...
1995       1995      1
1996       1996      1
1997       1997      0
1998       1998      1
1999       1999      0

[2000 rows x 2 columns]

Or another idea would be:

(pd.Series(np.r_[[0]*1500, [1]*500], name='label')
   .sample(frac=1)
   .rename_axis('feature_1')
   .reset_index(name='label'))

      feature_1  label
0           311      0
1           217      0
2          1940      1
3          1538      1
4          1904      1
...         ...    ...
1995        550      0
1996        836      0
1997       1065      0
1998       1343      0
1999       1070      0

[2000 rows x 2 columns]

Vaziri-Mahmoud · Accepted Answer · 2020-10-02 13:09:26Z

1

Try this:

import random
import string
import numpy as np
import pandas as pd
def get_random_string(length):
    letters = string.ascii_lowercase
    result_str = ''.join(random.choice(letters) for i in range(length))
    return result_str
arr=[]
label=[]
for i in range(2000):
  if i<1500:
    label.append(0)
  else:
    label.append(1)
  arr.append(get_random_string(8))
df=pd.DataFrame([arr,label]).T
df.columns=['f1','label']
df.head()

Output:

         f1 label
0  twfzvgpp     0
1  fvndhbaq     0
2  sawoflua     0
3  yqdgqtmx     0
4  glfsdyix     0

Source

answered Oct 2, 2020 at 13:09

Vaziri-Mahmoud

1501 silver badge11 bronze badges

Comments

adir abargil · Accepted Answer · 2020-10-02 13:11:31Z

0

class_label= random.sample(
        [0 for i in range(1500)]+[1 for i in range(500)])
df = pd.dataframe(dict(
        class_label= class_label,
        feature_1=list(range(2000))))

edited Oct 2, 2020 at 13:11

answered Oct 2, 2020 at 13:05

adir abargil

5,7453 gold badges23 silver badges29 bronze badges

Collectives™ on Stack Overflow

Unable to create pandas dataframe with particular number of class label

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related