3

I'm trying to combine all rows of a dataframe that have the same time stamp into a single row. The df is 5k by 20.

             A      B      ...
 timestamp
    11:00    NaN    10     ...
    11:00    5      NaN    ...
    12:00    15     20     ...
    ...      ...    ...

group the 2 11:00 rows as follows

             A      B        ...
timestamp
    11:00    5      10       ...
    12:00    15     20       ...
    ...      ...    ...

Any help would be appreciated. Thank you.

I have tried

df.groupby( df.index ).sum()
3
  • What if they had both A and B columns filled? Commented May 28, 2015 at 20:30
  • You're asking if the NaN's in the above were values instead? In my case, for each unique time stamp (the 2 rows of 11:00 in the above ex.) there will only be 1 value per column. I had initially tried a group by index and sum but this left me with all NaNs. Commented May 28, 2015 at 20:41
  • Post the code you have tried Commented May 28, 2015 at 20:41

4 Answers 4

2

You could melt ('unpivot') the DataFrame to convert it from wide form to long form, remove the null values, then aggregate via groupby.

import pandas as pd

df = pd.DataFrame({'timestamp' : ['11:00','11:00','12:00'],
               'A' : [None,5,15],
               'B' : [10,None,20]
              })

    A   B   timestamp
0   NaN 10  11:00
1   5   NaN 11:00
2   15  20  12:00

df2 = pd.melt(df, id_vars = 'timestamp') # specify the value_vars if needed

    timestamp   variable    value
0   11:00       A           NaN
1   11:00       A           5
2   12:00       A           15
3   11:00       B           10
4   11:00       B           NaN
5   12:00       B           20

df2.dropna(inplace=True)
df3 = df2.groupby(['timestamp', 'variable']).sum()

                        value
timestamp   variable    
11:00       A           5
            B           10
12:00       A           15
            B           20

df3.unstack()

            value
variable    A   B
timestamp       
11:00       5   10
12:00       15  20
Sign up to request clarification or add additional context in comments.

Comments

2

groupby after replacing the NaN values with 0's.

df.fillna(0, inplace=True)
df.groupby(df.index).sum()

Comments

1

Try using resample:

>>> df.resample('60Min', how='sum')
                      A   B
2015-05-28 11:00:00   5  10
2015-05-28 12:00:00  15  20

More examples can be found in the Pandas Documentation.

Comments

0

You cannot sum a number and a NaN in python. You probably need to use .aggregate() :)

1 Comment

yeah, i've been messing around with aggregate also but I can't seem to figure it out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.