0

I have two data frame.

df1 has index: str, int, float1

Sun, 1, 0.121
Sun, 2, 0.123

df2 has index: str, int, float1

Sun, 1, 0.5
Sun, 2, 0.6

I have to create df3 which has index: str, int, float1 from df1 and df3 by adding the float1 column of df1 and df2 together while making sure that the two rows I am adding have the same str and int value.

df3 should look like

Sun, 1, 0.621
Sun, 2, 0.723

Thank you!

4
  • 1
    merge then sum .. Commented Apr 30, 2019 at 18:55
  • @Wen-Ben Do you mind give me a bit more details? I am new on this. Do I do a native merge of two df and then the summation can turn two columns into one? Commented Apr 30, 2019 at 18:59
  • @Wen-Ben. I see what you mean. Thanks! Commented Apr 30, 2019 at 19:01
  • Saying index is confusing as index refers to the row labels in Pandas land. Did you mean 3 columns? Commented Apr 30, 2019 at 19:02

2 Answers 2

4

Use concat to merge them together and then use a groupby with sum() as the aggrigation method

df1 = pd.DataFrame([['Sun', 1, 0.121],['Sun', 2, 0.123]])
df2 = pd.DataFrame([['Sun', 1, 0.5],['Sun', 2, 0.6]])

df = pd.concat([df1, df2])
print(df)
#      0  1      2
# 0  Sun  1  0.121
# 1  Sun  2  0.123
# 0  Sun  1  0.500
# 1  Sun  2  0.600

print(df.groupby([0, 1], as_index=False).sum())
#      0  1      2
# 0  Sun  1  0.621
# 1  Sun  2  0.723

The df.groupby() works by passing the columns you want to use for grouping and what order. In this case, I don't have column names, so I passed integers to indicate the column positions. The as_index parameter will tell it to not try to reindex the dataframe with the grouped columns. The df.groupby() will return a DataFrameGroupBy object. By passing it to the .sum() function, it will return a dataframe with the results you are looking for.

gb = df.groupby([0, 1], as_index=False)
print(gb)
# <pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000000000109A4A8>

print(gb.sum())
#      0  1      2
# 0  Sun  1  0.621
# 1  Sun  2  0.723

print(gb.mean())
#      0  1       2
# 0  Sun  1  0.3105
# 1  Sun  2  0.3615
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. I am looking at each detailed step. Can you shed some light on whatever df.groupby([0,1]) returns? Like can I see the content of the newly grouped data frame?
Would it be like Sun, 1, {0.121,0.5}; Sun, 2, {0.123, 0.6} ? And the method .sum() is applied to {0.121,0.5} and {0.123, 0.6}?
3

Use:

df = pd.concat([df1,df2]).groupby(['col_str', 'col_int'], as_index=False).sum()

print(df)
  col_str  col_int  col_float1
0     Sun        1       0.621
1     Sun        2       0.723

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.