2

I had a DataFrame

     A B C
   0 1 2 3  
   1 2 3 3  
   2 3 2 1  

I needed to create a new column in a pandas DataFrame with 'yes' or 'no' randomly filling this column.

     A B C  NEW
   0 1 2 3  yes
   1 2 3 3  no
   2 3 2 1  no

Using random.choice results in a column with the same result for every line

     A B C  NEW
   0 1 2 3  no
   1 2 3 3  no
   2 3 2 1  no

I tried map, apply and applymap but there's a easier way to do.

1
  • 1
    np.random.choice(['yes','no'],len(df)) ? Commented Jan 31, 2021 at 18:23

1 Answer 1

8

You must set the new column to pd.Series then use random.choices:

import random

df['NEW'] = pd.Series(
    random.choices(['yes', 'no'], weights=[1, 1], k=len(df)), 
    index=df.index
)

random.choices will pick up one of this values for every line.

weights sets probabilities for pickin 'yes' or 'no', respectively. If you desire a higher chance for 'yes' i.e, you must increase the first number.

k sets the length of the Series. It must have the same length of DataFrame.

index is important to set as the same as df.index otherwise it can fill with NaN whether you have sliced it from a bigger DataFrame

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.