0

I have a dataset, lets say:

Column with duplicates        value1       value2
        1                        5            0
        1                        0            9

And what I want

Column with duplicates        value1       value2
        1                        5            9

I cannot figure out how to get this to work. The closest I got was using merge, but that left me with different suffixes.

Any ideas?

My real data looks like:

trial      Time       1    2      3      4
1         '0-100'     0    100    0      0
1         '0-100'     32    0     0      0
1         '100-200'   0     0    100     0
.
.
.
2         '0-100'     0    100    0      0

I want to keep the trials separate, and just merge the Times

2
  • What would you want to happen if there was a third line, 1 2 3? Or can that never happen? Commented Dec 31, 2013 at 19:03
  • That wouldn't happen. In my actual data, there are four numbers, so 1 5 0 0 0 1 0 9 0 0 So the only thing that would happen would be the other two columns that are filled with zeros need to get filled. Commented Dec 31, 2013 at 19:07

1 Answer 1

2

IIUC, you can use groupby and then aggregate:

>>> df
   Column with duplicates  value1  value2
0                       1       5       0
1                       1       0       9

[2 rows x 3 columns]
>>> df.groupby("Column with duplicates", as_index=False).sum()
   Column with duplicates  value1  value2
0                       1       5       9

[1 rows x 3 columns]

On the OP's updated example:

>>> df
   trial       Time   1    2    3  4
0      1    '0-100'   0  100    0  0
1      1    '0-100'  32    0    0  0
2      1  '100-200'   0    0  100  0
3      2    '0-100'   0  100    0  0

[4 rows x 6 columns]
>>> df.groupby("trial", as_index=False).sum()
   trial   1    2    3  4
0      1  32  100  100  0
1      2   0  100    0  0

[2 rows x 5 columns]
Sign up to request clarification or add additional context in comments.

5 Comments

This ended up summing my Trials as well.
@NightHallow: no, it didn't, as the example you gave didn't have any trials. And it shouldn't have summed it regardless (see my updated example).
The thing there is that I also want the "Time" to be kept separate. I got it by doing df.groupby("trial","Time").sum()
@NightHallow: some general advice-- we're happy to help, but you should give a minimal but full example right from the start. It was only after a post, an edit, and three comments that you mentioned you wanted an axis that didn't even exist in your original example to be used as a key. [I think it would need to be df.groupby(["trial", "time"]).sum(), BTW.]
I know, sorry. It was because I was testing with a smaller example, that turned out to be a bad one. The problem I have now is that since I have the time as '0-100' which is a string, it wont sort properly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.