How to merge(join) two rows in pandas with different values in each column?

Question

I am trying to combine four rows into two based on the name of the 'Country'. the dataframe is as follows (sorry for the bad format if there is a better way to show it please let me know):

(Index),Country,SPI_Score,WHR_Score

...............................

190,Congo Republic of,48.45, NaN

191,Congo Democratic Republic of,42.25, NaN

................................

198,Congo (Brazzaville), NaN ,5.194

199,Congo (Kinshasa), NaN ,4.311

My problem here is that when I did an outer join the countries had different names. I tried replacing the country names like this:

for i in range(len(df['Country'])):
    if df.iloc[i]['Country'] in ['Congo Republic of', 'Congo (Brazzaville)']:
        df.iloc[i]['Country'] = 'Republic of the Congo'
    elif df[i]['Country'] in ['Congo Democratic Republic of', 'Congo (Kinshasa)']:
        df.iloc[i]['Country'] = 'Democratic Republic of the Congo'
    else:
        continue

However this did not work and gave me the original df. The output that I want is:

(Index),Country,SPI_Score,WHR_Score

...............................

190,Republic of the Congo,48.45, 5.194

191,Democratic Republic of the Congo,42.25, 4.311

piterbarg · Accepted Answer · 2021-03-12 08:37:54Z

1

You can put your name mappings into a dictionary and map to the new name. Set

name_mapper = {'Congo Republic of':'Republic of the Congo',
'Congo (Brazzaville)':'Republic of the Congo',
'Congo Democratic Republic of' : 'Democratic Republic of the Congo', 
'Congo (Kinshasa)': 'Democratic Republic of the Congo'
}

the easiest way to map a column is to use something like

df['Country'].map(name_mapper)

but that will return NaNs if there is no match in 'Country' on the keys of this dict. So below is a more robust version

df['C']  = df['Country'].apply(lambda v:name_mapper.get(v,v))

Now we can groupby on 'C'

df.groupby('C').sum()

to obtain


    C                                   SPI_Score   WHR_Score
0   Democratic Republic of the Congo    42.25   4.311
1   Republic of the Congo               48.45   5.194

answered Mar 12, 2021 at 8:37

piterbarg

8,2292 gold badges9 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Alejandro Over a year ago

Thank you for answering! I had no idea how to do it. This seems like it should be a built in pandas method. It seems like a lot of work for one data point any way to look at this.

Collectives™ on Stack Overflow

How to merge(join) two rows in pandas with different values in each column?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related