1

I want to create a column in a dataframe that is conditionally filled with values. Basically my dataframe loks like this

  Origin     X
0 Guatemala  x
1 China      x
2 Kenya      x
3 Venezuela  x
4 Bangladesh x

What I want to do now is create an additional column 'Continent', which adds the continent dependent on the country. My result would look like this:

 Origin      X  Continent
0 Guatemala  x  South america
1 China      x  Asia
2 Kenya      x  Africa
3 Venezuela  x  South america
4 Bangladesh x  Asia

I have tried the following codes to accieve what i want:

def GetContinents(x):
    if x['Origin']== 'Thailand' or 'Indonesia' or 'China' or 'Japan' or 'Bangladesh':
        return 'Asia'
    elif x['Origin']== 'Boliva' or 'Guatemala' or 'Venezuela' or 'Mexico' or 'Argentinia':
        return 'South America'
    elif x['Origin']== 'Guinea Bissau' or 'Egypt' or 'Zaire' or 'Kenya':
        return 'Africa'
    else:
        return 'unknown'

df['Continent']= df.apply(GetContinents, axis=1)

This one fills all the columns in 'continent' with 'Asia' mysteriously.

df['Continent'] = np.where(df['Origin'] == 'Bangladesh', 'Asia', 'unknown')

This one works fine in terms that it fills 'Asia' into the right column and unknown into all others, but when I try to make something like df['Continent'] = np.where(df['Origin'] == 'Bangladesh' or 'China', 'Asia', 'unknown') I get an error.

So basically my question is: how can I fullfill my if condition with different values?

1 Answer 1

1

You can construct the lists for each continent and apply a func:

In [35]:
asia = ['Thailand','Indonesia','China','Japan','Bangladesh']
south_america = ['Boliva' , 'Guatemala' , 'Venezuela' , 'Mexico' , 'Argentinia']
africa = [ 'Guinea Bissau' , 'Egypt' , 'Zaire' , 'Kenya']
def find_continent(x):
    if x in asia:
        return 'Asia'
    elif x in south_america:
        return 'South America'
    elif x in africa:
        return 'Africa'
    else:
        return 'Unknown'
df['Continent'] = df['Origin'].apply(find_continent)
df

Out[35]:
       Origin  X      Continent
0   Guatemala  x  South America
1       China  x           Asia
2       Kenya  x         Africa
3   Venezuela  x  South America
4  Bangladesh  x           Asia

Or if you have a much larger df then you can just make successive calls using isin and mask the rows using loc:

In [38]:
df.loc[df['Origin'].isin(asia),'Continent'] = 'Asia'
df.loc[df['Origin'].isin(south_america),'Continent'] = 'South America'
df.loc[df['Origin'].isin(africa),'Continent'] = 'Africa'
df['Continent'] = df['Continent'].fillna('Unknown')
df

Out[38]:
       Origin  X      Continent
0   Guatemala  x  South America
1       China  x           Asia
2       Kenya  x         Africa
3   Venezuela  x  South America
4  Bangladesh  x           Asia

As to why your attempts didn't work:

if x['Origin']== 'Thailand' or 'Indonesia' or 'China' or 'Japan' or 'Bangladesh'

This returns True because or 'Indonesia' is always True so all rows get set to Asia.

You should change it to like this:

if x['Origin'] in ('Thailand' , 'Indonesia' , 'China' , 'Japan' , 'Bangladesh'):

See related: How do I test one variable against multiple values?

Using np.where would be fine but you're not masking the rows so you continuously overwrite the rows so only the last op persists.

Sign up to request clarification or add additional context in comments.

1 Comment

Great! Thanks also for the explanations why my code did not work!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.