Add two data frame but only a few selected column and only when other column values are the same

Question

I have two data frame.

df1 has index: str, int, float1

Sun, 1, 0.121
Sun, 2, 0.123

df2 has index: str, int, float1

Sun, 1, 0.5
Sun, 2, 0.6

I have to create df3 which has index: str, int, float1 from df1 and df3 by adding the float1 column of df1 and df2 together while making sure that the two rows I am adding have the same str and int value.

df3 should look like

Sun, 1, 0.621
Sun, 2, 0.723

Thank you!

@Wen-Ben Do you mind give me a bit more details? I am new on this. Do I do a native merge of two df and then the summation can turn two columns into one? — Loading Zone
– Loading Zone, Commented Apr 30, 2019 at 18:59
Saying index is confusing as index refers to the row labels in Pandas land. Did you mean 3 columns? — piRSquared
– piRSquared, Commented Apr 30, 2019 at 19:02

Cohan · Accepted Answer · 2019-04-30 19:14:12Z

4

Use concat to merge them together and then use a groupby with sum() as the aggrigation method

df1 = pd.DataFrame([['Sun', 1, 0.121],['Sun', 2, 0.123]])
df2 = pd.DataFrame([['Sun', 1, 0.5],['Sun', 2, 0.6]])

df = pd.concat([df1, df2])
print(df)
#      0  1      2
# 0  Sun  1  0.121
# 1  Sun  2  0.123
# 0  Sun  1  0.500
# 1  Sun  2  0.600

print(df.groupby([0, 1], as_index=False).sum())
#      0  1      2
# 0  Sun  1  0.621
# 1  Sun  2  0.723

The df.groupby() works by passing the columns you want to use for grouping and what order. In this case, I don't have column names, so I passed integers to indicate the column positions. The as_index parameter will tell it to not try to reindex the dataframe with the grouped columns. The df.groupby() will return a DataFrameGroupBy object. By passing it to the .sum() function, it will return a dataframe with the results you are looking for.

gb = df.groupby([0, 1], as_index=False)
print(gb)
# <pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000000000109A4A8>

print(gb.sum())
#      0  1      2
# 0  Sun  1  0.621
# 1  Sun  2  0.723

print(gb.mean())
#      0  1       2
# 0  Sun  1  0.3105
# 1  Sun  2  0.3615

edited Apr 30, 2019 at 19:14

answered Apr 30, 2019 at 18:58

Cohan

4,5942 gold badges25 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Loading Zone Over a year ago

Thanks. I am looking at each detailed step. Can you shed some light on whatever df.groupby([0,1]) returns? Like can I see the content of the newly grouped data frame?

Loading Zone Over a year ago

Would it be like Sun, 1, {0.121,0.5}; Sun, 2, {0.123, 0.6} ? And the method .sum() is applied to {0.121,0.5} and {0.123, 0.6}?

Erfan · Accepted Answer · 2019-04-30 18:57:48Z

3

Use:

df = pd.concat([df1,df2]).groupby(['col_str', 'col_int'], as_index=False).sum()

print(df)
  col_str  col_int  col_float1
0     Sun        1       0.621
1     Sun        2       0.723

answered Apr 30, 2019 at 18:57

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Collectives™ on Stack Overflow

Add two data frame but only a few selected column and only when other column values are the same

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related