Subsetting a Boolean variable in Python

Question

When a DataFrame ("Given_DF") has a Boolean variable (such as B below),how can one subset the DataFrame to keep only rows of Variable B with True value?.

Given_DF

ID    A     B
0    123   True
1    456   False
2    789   False
3    132   True
4    465   False

The 'Desired' subset is the DataFrame with only two rows (with ID 0 and 3).

Tried subsetting B as a column,
```
Desired = Given_DF["B"].isin(True)  
```
Tried indexing the variable B and using loc to subset to "True" incidences B.
```
prep.sort_index(level=["B"])
Desired = prep.loc["True"]
```

Neither attempts worked. Help would be appreciated.

Barmar · Accepted Answer · 2023-10-06 05:56:35Z

1

The same way you subset with any other type. Put an expression that matches your condition inside the subscript of the df.

Desired = Given_DF[Given_DF["B"] == True]

or more simply

Desired = Given_DF[Given_DF["B"]]

.isin() is used when you have a collection of values you want to match, but True is not a collection. You'd have to write .isin([True]) for this to work.

answered Oct 6, 2023 at 5:56

Barmar

789k57 gold badges555 silver badges669 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Barmar Over a year ago

It worked when I tested it.

Barmar Over a year ago

Online demo: ideone.com/TUP3Jk

Barmar Over a year ago

What changed since your earlier comment?

sonicpoem Over a year ago

I refreshed my Jupyter notebook and then the new codes worked fine. Sorry. I should have checked that first before commening. Thanks.

Ingwersen_erik · Accepted Answer · 2023-10-06 06:05:36Z

There are multiple ways to achieve your desired output. Here are a few of them:

# Option 1
filtered_df = Given_DF[Given_DF["B"] == True]

# Option 2: Using `.loc`
filtered_df = Given_DF.loc[Given_DF["B"] == True, :]

# Option 3: Using pd.DataFrame.query
filtered_df = Given_DF.query("B == True")

print(filtered_df)
# Prints:
#
#    ID    A     B
# 0   0  123  True
# 3   3  132  True

If you want to select a specific column(s) after filtering for rows with column "B" equal to True, you can use the following:

# Filtering column B using option 1, previously exemplified and then selecting column "A"
Given_DF[Given_DF["B"] == True]["A"]

# Filtering column B using option 2, previously exemplified and then selecting columns "A", and "ID"
Given_DF.loc[Given_DF["B"] == True, ["ID", "A"]]

# Filtering column B using option 3, previously exemplified and then selecting its index values
remaining_indexes = Given_DF.query("B == True").index

# You can then use these indexes to filter `Given_DF` dataframe of apply it for many other
# use-cases:
Given_DF.loc[Given_DF.index.isin(remaining_indexes), :]

Collectives™ on Stack Overflow

Subsetting a Boolean variable in Python

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related