Insert a new column in pandas with random string values

Question

I had a DataFrame

I needed to create a new column in a pandas DataFrame with 'yes' or 'no' randomly filling this column.

     A B C  NEW
   0 1 2 3  yes
   1 2 3 3  no
   2 3 2 1  no

Using random.choice results in a column with the same result for every line

     A B C  NEW
   0 1 2 3  no
   1 2 3 3  no
   2 3 2 1  no

I tried map, apply and applymap but there's a easier way to do.

Henry Ecker · Accepted Answer · 2022-10-12 16:11:32Z

8

You must set the new column to pd.Series then use random.choices:

import random

df['NEW'] = pd.Series(
    random.choices(['yes', 'no'], weights=[1, 1], k=len(df)), 
    index=df.index
)

random.choices will pick up one of this values for every line.

weights sets probabilities for pickin 'yes' or 'no', respectively. If you desire a higher chance for 'yes' i.e, you must increase the first number.

k sets the length of the Series. It must have the same length of DataFrame.

index is important to set as the same as df.index otherwise it can fill with NaN whether you have sliced it from a bigger DataFrame

Henry Ecker♦

35.8k19 gold badges48 silver badges67 bronze badges

answered Jan 31, 2021 at 18:21

A Neto

2333 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

1 Answer 1