1

I am trying to combine four rows into two based on the name of the 'Country'. the dataframe is as follows (sorry for the bad format if there is a better way to show it please let me know):

(Index),Country,SPI_Score,WHR_Score

...............................

190,Congo Republic of,48.45, NaN

191,Congo Democratic Republic of,42.25, NaN

................................

198,Congo (Brazzaville), NaN ,5.194

199,Congo (Kinshasa), NaN ,4.311

My problem here is that when I did an outer join the countries had different names. I tried replacing the country names like this:

for i in range(len(df['Country'])):
    if df.iloc[i]['Country'] in ['Congo Republic of', 'Congo (Brazzaville)']:
        df.iloc[i]['Country'] = 'Republic of the Congo'
    elif df[i]['Country'] in ['Congo Democratic Republic of', 'Congo (Kinshasa)']:
        df.iloc[i]['Country'] = 'Democratic Republic of the Congo'
    else:
        continue

However this did not work and gave me the original df. The output that I want is:

(Index),Country,SPI_Score,WHR_Score

...............................

190,Republic of the Congo,48.45, 5.194

191,Democratic Republic of the Congo,42.25, 4.311


1 Answer 1

1

You can put your name mappings into a dictionary and map to the new name. Set

name_mapper = {'Congo Republic of':'Republic of the Congo',
'Congo (Brazzaville)':'Republic of the Congo',
'Congo Democratic Republic of' : 'Democratic Republic of the Congo', 
'Congo (Kinshasa)': 'Democratic Republic of the Congo'
}

the easiest way to map a column is to use something like

df['Country'].map(name_mapper)

but that will return NaNs if there is no match in 'Country' on the keys of this dict. So below is a more robust version

df['C']  = df['Country'].apply(lambda v:name_mapper.get(v,v))

Now we can groupby on 'C'

df.groupby('C').sum()

to obtain


    C                                   SPI_Score   WHR_Score
0   Democratic Republic of the Congo    42.25   4.311
1   Republic of the Congo               48.45   5.194
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for answering! I had no idea how to do it. This seems like it should be a built in pandas method. It seems like a lot of work for one data point any way to look at this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.