2

Say I have a table like so:

| Name   | Age |
|--------|-----|
| Bob    | 2   |
| John   | 3   |
| Tim    | 4   |
| Ben    | 5   |
| Ella   | 4   |
| Sophie | 5   |
| Grace  | 6   |
| Bill   | 34  |
| Ron    | 23  |
| Harry  | 2   |

How could I add a new column which selects a random 10% of the rows and adds a new column with True? Then sets the rest to False. Like so?

| Name   | Age |       |
|--------|-----|-------|
| Bob    | 2   | False |
| John   | 3   | False |
| Tim    | 4   | False |
| Ben    | 5   | True  |
| Ella   | 4   | False |
| Sophie | 5   | False |
| Grace  | 6   | False |
| Bill   | 34  | False |
| Ron    | 23  | False |
| Harry  | 2   | False |

2 Answers 2

2

You can use pandas' sample function:

df.loc[df.sample(frac=0.1).index, "sample_column"] = True
df["sample_column"] = df["sample_column"].fillna(False)
Sign up to request clarification or add additional context in comments.

Comments

0

Use pandas.DataFrame.sample

df['flag'] = df.index.isin(df.sample(frac=0.1, random_state=1).index)

OR

df['flag'] = False
df.loc[df.sample(frac=0.1, random_state=1).index, 'flag'] = True

Sample Output

>>> df
      Name   Age   flag
1      Bob   2.0  False
2     John   3.0  False
3      Tim   4.0   True
4      Ben   5.0  False
5     Ella   4.0  False
6   Sophie   5.0  False
7    Grace   6.0  False
8     Bill  34.0  False
9      Ron  23.0  False
10   Harry   2.0  False

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.