Merging Pandas DataFrames with the same column name

Question

I have a dataset, lets say:

Column with duplicates        value1       value2
        1                        5            0
        1                        0            9

And what I want

Column with duplicates        value1       value2
        1                        5            9

I cannot figure out how to get this to work. The closest I got was using merge, but that left me with different suffixes.

Any ideas?

My real data looks like:

trial      Time       1    2      3      4
1         '0-100'     0    100    0      0
1         '0-100'     32    0     0      0
1         '100-200'   0     0    100     0
.
.
.
2         '0-100'     0    100    0      0

I want to keep the trials separate, and just merge the Times

What would you want to happen if there was a third line, 1 2 3? Or can that never happen? — DSM
– DSM, Commented Dec 31, 2013 at 19:03
That wouldn't happen. In my actual data, there are four numbers, so 1 5 0 0 0 1 0 9 0 0 So the only thing that would happen would be the other two columns that are filled with zeros need to get filled. — Wesley Bowman
– Wesley Bowman, Commented Dec 31, 2013 at 19:07

DSM · Accepted Answer · 2013-12-31 19:21:10Z

2

IIUC, you can use groupby and then aggregate:

>>> df
   Column with duplicates  value1  value2
0                       1       5       0
1                       1       0       9

[2 rows x 3 columns]
>>> df.groupby("Column with duplicates", as_index=False).sum()
   Column with duplicates  value1  value2
0                       1       5       9

[1 rows x 3 columns]

On the OP's updated example:

>>> df
   trial       Time   1    2    3  4
0      1    '0-100'   0  100    0  0
1      1    '0-100'  32    0    0  0
2      1  '100-200'   0    0  100  0
3      2    '0-100'   0  100    0  0

[4 rows x 6 columns]
>>> df.groupby("trial", as_index=False).sum()
   trial   1    2    3  4
0      1  32  100  100  0
1      2   0  100    0  0

[2 rows x 5 columns]

edited Dec 31, 2013 at 19:21

answered Dec 31, 2013 at 19:07

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Wesley Bowman Over a year ago

This ended up summing my Trials as well.

DSM Over a year ago

@NightHallow: no, it didn't, as the example you gave didn't have any trials. And it shouldn't have summed it regardless (see my updated example).

Wesley Bowman Over a year ago

The thing there is that I also want the "Time" to be kept separate. I got it by doing df.groupby("trial","Time").sum()

DSM Over a year ago

@NightHallow: some general advice-- we're happy to help, but you should give a minimal but full example right from the start. It was only after a post, an edit, and three comments that you mentioned you wanted an axis that didn't even exist in your original example to be used as a key. [I think it would need to be df.groupby(["trial", "time"]).sum(), BTW.]

Wesley Bowman Over a year ago

I know, sorry. It was because I was testing with a smaller example, that turned out to be a bad one. The problem I have now is that since I have the time as '0-100' which is a string, it wont sort properly.

Collectives™ on Stack Overflow

Merging Pandas DataFrames with the same column name

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related