4

Is it possible to create a random pandas dataframe with 1500 rows to have class label 0 and 500 rows to have class label as 1.

It should be like

feature_1   class_label

sdfdsfsdfd    0
kjdkfkjdsf    0
jkkjhjknn     1
dfsfgdsfd     0
gfdgdfsdd     1

The values of feature_1 column can be anything but it's 1500 values should have label 0 and 500 values should have label 1.

3 Answers 3

2

We can use numpy here, and draw random samples from a range of the length of the column using np.random.choice:

a = np.zeros(2000, dtype='int')
a[np.random.choice(range(len(a)), 500)] = 1
pd.Series(a).rename_axis('feature_1').reset_index(name='label')
      feature_1  label
0             0      0
1             1      0
2             2      0
3             3      0
4             4      0
...         ...    ...
1995       1995      1
1996       1996      1
1997       1997      0
1998       1998      1
1999       1999      0

[2000 rows x 2 columns]

Or another idea would be:

(pd.Series(np.r_[[0]*1500, [1]*500], name='label')
   .sample(frac=1)
   .rename_axis('feature_1')
   .reset_index(name='label'))

      feature_1  label
0           311      0
1           217      0
2          1940      1
3          1538      1
4          1904      1
...         ...    ...
1995        550      0
1996        836      0
1997       1065      0
1998       1343      0
1999       1070      0

[2000 rows x 2 columns]
Sign up to request clarification or add additional context in comments.

Comments

1

Try this:

import random
import string
import numpy as np
import pandas as pd
def get_random_string(length):
    letters = string.ascii_lowercase
    result_str = ''.join(random.choice(letters) for i in range(length))
    return result_str
arr=[]
label=[]
for i in range(2000):
  if i<1500:
    label.append(0)
  else:
    label.append(1)
  arr.append(get_random_string(8))
df=pd.DataFrame([arr,label]).T
df.columns=['f1','label']
df.head()

Output:

         f1 label
0  twfzvgpp     0
1  fvndhbaq     0
2  sawoflua     0
3  yqdgqtmx     0
4  glfsdyix     0

Source

Comments

0
class_label= random.sample(
        [0 for i in range(1500)]+[1 for i in range(500)])
df = pd.dataframe(dict(
        class_label= class_label,
        feature_1=list(range(2000))))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.