How to return dataframe with multiple conditions in pandas

Question

Here is the sample dataframe.

data = [['United Kingdom', 'High income'], ['Albania', 'Upper middle income'], ['Russia', 'Upper middle income'], ['Afganistan','Low income'], ['USA','High income']]
df = pd.DataFrame(data, columns = ['Country', 'Income Group'])

Here I was trying to return only countries with high income and upper middle income:

df = np.where(df['Income Group'] == 'High income' & df['Income Group'] == 'Upper middle income')

Here is the output:

TypeError: tuple indices must be integers or slices, not str

But if you use the same with other column works fine:

df = np.where(df['Country'] == 'USA')

What is the problem with column 'Income Group'?

Very appreciate for any help

what is your expacted output? it seems that you are converting dataframe to numpy array.. and also you forget () around each condition — adir abargil
– adir abargil, Commented Jan 1, 2021 at 13:29
@adirabargil I was just trying to return dataframe with rows where 'Income Group' equals 'High income' and 'Upper middle income'. If I put () around each condition the output is the same. — Aleksandr Pay
– Aleksandr Pay, Commented Jan 1, 2021 at 13:34

perl · Accepted Answer · 2021-01-01 13:48:52Z

3

isin can be also useful here:

df[df['Income Group'].isin(['High income', 'Upper middle income'])]

Output:

          Country         Income Group
0  United Kingdom          High income
1         Albania  Upper middle income
2          Russia  Upper middle income
4             USA          High income

answered Jan 1, 2021 at 13:48

perl

9,9811 gold badge14 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Snehal Nair · Accepted Answer · 2021-01-01 13:36:54Z

2

Another technique other than np.where, to slice dataframe based on multiple condition:

df[ (condition_1) & (condition_2) | (condition_3) ]

For your question, if you want to return countries with high income or upper middle income, you should use or (|) condition not and (&)

df_high = df[(df['Income Group'] == 'High income') | (df['Income Group'] == 'Upper middle income')]

df_high

    Country Income   Group
0   United Kingdom   High income
1   Albania          Upper middle income
2   Russia           Upper middle income
4   USA              High income

answered Jan 1, 2021 at 13:36

Snehal Nair

1912 silver badges6 bronze badges

1 Comment

adir abargil Over a year ago

Nice answer! Voted up!

adir abargil · Accepted Answer · 2021-01-01 13:41:13Z

2

it seems from the sample you posted that you are missing () and you are confusing or with and so | instread of &:

data = [['United Kingdom', 'High income'], ['Albania', 'Upper middle income'], ['Russia', 'Upper middle income'], ['Afganistan','Low income'], ['USA','High income']]
df = pd.DataFrame(data, columns = ['Country', 'Income Group']) 
df = np.where((df['Income Group'] == 'High income') |(df['Income Group'] == 'Upper middle income'))
df
>>>(array([0, 1, 2, 4], dtype=int64),)

which return the indexes instead of the dataframe, if you plan to return the dataframe so do so:

df = df[(df['Income Group'] == 'High income') |(df['Income Group'] == 'Upper middle income')]
df
>>> Country Income Group
0   United Kingdom  High income
1   Albania Upper middle income
2   Russia  Upper middle income
4   USA High income

edited Jan 1, 2021 at 13:41

answered Jan 1, 2021 at 13:31

adir abargil

5,7453 gold badges23 silver badges29 bronze badges

3 Comments

Aleksandr Pay Over a year ago

I tried | and &, the problem remains the same: TypeError: tuple indices must be integers or slices, not str

adir abargil Over a year ago

Did you copy-paste my answer and example? Or tested it in your environment with real data? Note that when you do df=np.where... you convert the dataframe to a numpy array of indexes... could be that you run this many times?

Aleksandr Pay Over a year ago

Thank you for your help! Yea the problem was with np.where...

forgetso · Accepted Answer · 2021-01-01 13:30:41Z

0

You need OR (|) instead of AND (&) and you need the brackets around the individual conditions

data = [['United Kingdom', 'High income'], ['Albania', 'Upper middle income'], ['Russia', 'Upper middle income'], ['Afganistan','Low income'], ['USA','High income']]
df = pd.DataFrame(data, columns = ['Country', 'Income Group']) 
df = np.where((df['Income Group'] == 'High income') | (df['Income Group'] == 'Upper middle income')) # changed to OR
df

answered Jan 1, 2021 at 13:30

forgetso

2,55420 silver badges38 bronze badges

Comments

Lior Cohen · Accepted Answer · 2021-01-01 13:32:24Z

0

This is just a missing parentheses. Because in python precedence & is stronger than ==.

You need something like:

df = np.where((df['Income Group'] == 'High income') & (df['Income Group'] == 'Upper middle income'))

update: and yes as other stated too, you need or (|) operator.

answered Jan 1, 2021 at 13:32

Lior Cohen

5,7202 gold badges18 silver badges33 bronze badges

1 Comment

Aleksandr Pay Over a year ago

I tried the problem is remains the same: TypeError: tuple indices must be integers or slices, not str

Sachin Rawat · Accepted Answer · 2021-01-01 14:21:35Z

0

data = [['United Kingdom', 'High income'], ['Albania', 'Upper middle income'], 
       ['Russia', 'Upper middle income'], ['Afganistan','Low income'], ['USA','High 
         income']]
df1 = pd.DataFrame(data, columns = ['Country', 'Income Group']) 
df2 = df[df['Income Group']=='High income']
df3 = df[df['Income Group']=='Upper middle income']
df4 = df2.merge(df3,how='outer')
print(df4)

answered Jan 1, 2021 at 14:21

Sachin Rawat

1912 silver badges9 bronze badges

Collectives™ on Stack Overflow

How to return dataframe with multiple conditions in pandas

6 Answers 6

Comments

1 Comment

3 Comments

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

1 Comment

3 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related