2

Here is the sample dataframe.

data = [['United Kingdom', 'High income'], ['Albania', 'Upper middle income'], ['Russia', 'Upper middle income'], ['Afganistan','Low income'], ['USA','High income']]
df = pd.DataFrame(data, columns = ['Country', 'Income Group']) 

Here I was trying to return only countries with high income and upper middle income:

df = np.where(df['Income Group'] == 'High income' & df['Income Group'] == 'Upper middle income')

Here is the output:

TypeError: tuple indices must be integers or slices, not str

But if you use the same with other column works fine:

df = np.where(df['Country'] == 'USA')

What is the problem with column 'Income Group'?

Very appreciate for any help

3
  • what is your expacted output? it seems that you are converting dataframe to numpy array.. and also you forget () around each condition Commented Jan 1, 2021 at 13:29
  • @adirabargil I was just trying to return dataframe with rows where 'Income Group' equals 'High income' and 'Upper middle income'. If I put () around each condition the output is the same. Commented Jan 1, 2021 at 13:34
  • look at my answers... Commented Jan 1, 2021 at 13:35

6 Answers 6

3

isin can be also useful here:

df[df['Income Group'].isin(['High income', 'Upper middle income'])]

Output:

          Country         Income Group
0  United Kingdom          High income
1         Albania  Upper middle income
2          Russia  Upper middle income
4             USA          High income
Sign up to request clarification or add additional context in comments.

Comments

2

Another technique other than np.where, to slice dataframe based on multiple condition:

df[ (condition_1) & (condition_2) | (condition_3) ]

For your question, if you want to return countries with high income or upper middle income, you should use or (|) condition not and (&)

df_high = df[(df['Income Group'] == 'High income') | (df['Income Group'] == 'Upper middle income')]

df_high

    Country Income   Group
0   United Kingdom   High income
1   Albania          Upper middle income
2   Russia           Upper middle income
4   USA              High income

1 Comment

Nice answer! Voted up!
2

it seems from the sample you posted that you are missing () and you are confusing or with and so | instread of &:

data = [['United Kingdom', 'High income'], ['Albania', 'Upper middle income'], ['Russia', 'Upper middle income'], ['Afganistan','Low income'], ['USA','High income']]
df = pd.DataFrame(data, columns = ['Country', 'Income Group']) 
df = np.where((df['Income Group'] == 'High income') |(df['Income Group'] == 'Upper middle income'))
df
>>>(array([0, 1, 2, 4], dtype=int64),)

which return the indexes instead of the dataframe, if you plan to return the dataframe so do so:

df = df[(df['Income Group'] == 'High income') |(df['Income Group'] == 'Upper middle income')]
df
>>> Country Income Group
0   United Kingdom  High income
1   Albania Upper middle income
2   Russia  Upper middle income
4   USA High income

3 Comments

I tried | and &, the problem remains the same: TypeError: tuple indices must be integers or slices, not str
Did you copy-paste my answer and example? Or tested it in your environment with real data? Note that when you do df=np.where... you convert the dataframe to a numpy array of indexes... could be that you run this many times?
Thank you for your help! Yea the problem was with np.where...
0

You need OR (|) instead of AND (&) and you need the brackets around the individual conditions

data = [['United Kingdom', 'High income'], ['Albania', 'Upper middle income'], ['Russia', 'Upper middle income'], ['Afganistan','Low income'], ['USA','High income']]
df = pd.DataFrame(data, columns = ['Country', 'Income Group']) 
df = np.where((df['Income Group'] == 'High income') | (df['Income Group'] == 'Upper middle income')) # changed to OR
df

Comments

0

This is just a missing parentheses. Because in python precedence & is stronger than ==.

You need something like:

df = np.where((df['Income Group'] == 'High income') & (df['Income Group'] == 'Upper middle income'))

update: and yes as other stated too, you need or (|) operator.

1 Comment

I tried the problem is remains the same: TypeError: tuple indices must be integers or slices, not str
0
data = [['United Kingdom', 'High income'], ['Albania', 'Upper middle income'], 
       ['Russia', 'Upper middle income'], ['Afganistan','Low income'], ['USA','High 
         income']]
df1 = pd.DataFrame(data, columns = ['Country', 'Income Group']) 
df2 = df[df['Income Group']=='High income']
df3 = df[df['Income Group']=='Upper middle income']
df4 = df2.merge(df3,how='outer')
print(df4)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.