1

When a DataFrame ("Given_DF") has a Boolean variable (such as B below),how can one subset the DataFrame to keep only rows of Variable B with True value?.

Given_DF

ID    A     B
0    123   True
1    456   False
2    789   False
3    132   True
4    465   False

The 'Desired' subset is the DataFrame with only two rows (with ID 0 and 3).

  1. Tried subsetting B as a column,

    Desired = Given_DF["B"].isin(True)  
    
  2. Tried indexing the variable B and using loc to subset to "True" incidences B.

    prep.sort_index(level=["B"])
    Desired = prep.loc["True"]
    

Neither attempts worked. Help would be appreciated.

2 Answers 2

1

The same way you subset with any other type. Put an expression that matches your condition inside the subscript of the df.

Desired = Given_DF[Given_DF["B"] == True]

or more simply

Desired = Given_DF[Given_DF["B"]]

.isin() is used when you have a collection of values you want to match, but True is not a collection. You'd have to write .isin([True]) for this to work.

Sign up to request clarification or add additional context in comments.

4 Comments

It worked when I tested it.
Online demo: ideone.com/TUP3Jk
What changed since your earlier comment?
I refreshed my Jupyter notebook and then the new codes worked fine. Sorry. I should have checked that first before commening. Thanks.
0

There are multiple ways to achieve your desired output. Here are a few of them:

# Option 1
filtered_df = Given_DF[Given_DF["B"] == True]

# Option 2: Using `.loc`
filtered_df = Given_DF.loc[Given_DF["B"] == True, :]

# Option 3: Using pd.DataFrame.query
filtered_df = Given_DF.query("B == True")

print(filtered_df)
# Prints:
#
#    ID    A     B
# 0   0  123  True
# 3   3  132  True

If you want to select a specific column(s) after filtering for rows with column "B" equal to True, you can use the following:

# Filtering column B using option 1, previously exemplified and then selecting column "A"
Given_DF[Given_DF["B"] == True]["A"]

# Filtering column B using option 2, previously exemplified and then selecting columns "A", and "ID"
Given_DF.loc[Given_DF["B"] == True, ["ID", "A"]]

# Filtering column B using option 3, previously exemplified and then selecting its index values
remaining_indexes = Given_DF.query("B == True").index

# You can then use these indexes to filter `Given_DF` dataframe of apply it for many other
# use-cases:
Given_DF.loc[Given_DF.index.isin(remaining_indexes), :]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.