Merging certain rows in pandas dataframe

Question

I have this dataframe, consisting in 73 rows:

Date    Col1    Col2   Col3
1975   float   float  float
1976   float   float  float
1976   float   float  float
1977   float   float  float
1978   float   float  float
....
....

There are certain years appearing twice because the values were taken twice that year. What I want to do is to merge those rows where the year is the same, taking the mean value of each column for those specific two rows. The fact is that I am still familiarizing with pandas and I don't really understand the usage of the loc and iloc selectors. This is what I have tried, but I am sure this is completely wrong and non-pythonic:

for i in range(72):
    if df.Date[i]==df.Date[i+1]:
        df.Very_satisfied[i]= (df.Very_satisfied[i]+df.Very_satisfied[i+1])/2
        df.Fairly_satisfied[i]= (df.Fairly_satisfied[i]+df.Fairly_satisfied[i+1])/2
        df.NV_satisfied[i]= (df.NV_satisfied[i]+ df.NV_satisfied[i+1])/2
        df.Not_satisfied[i]= (df.Not_satisfied[i]+ df.Not_satisfied[i+1])/2
        df.DK[i]= (df.DK[i]+ df.DK[i+1])/2
        a=i+1
        str(a)
        df.drop(a)

where "very satisfied", "fairly satisfied" ecc. are the columns. The point in my code is: if two years are the same calculate the mean of each value, substitute it in the first row and delete the second row. I really need something smarter and more elegant.

Unatiel · Accepted Answer · 2017-08-06 11:39:59Z

1

You can use groupby() and then mean() for this. Here is an example :

import pandas as pd
import numpy as np

df = pd.DataFrame({'date': list(range(25)) * 2, 'col1': np.random.random(50) * 100, 'col2': np.random.random(50)})
df.groupby('date').mean()

This will take all the rows which the same date, calculate the mean value of all the rows in the group for each column.

This outputs on my sample :

df.groupby('date').mean().head()
           col1      col2
date
0     42.881950  0.436073
1     32.114299  0.309742
2     96.819446  0.809071
3     30.606661  0.284257
4     40.690211  0.624972

For this input :

df[df['date'] < 5]

    date       col1      col2
0      0  67.268605  0.393560
1      1  55.864578  0.508636
2      2  97.735942  0.861162
3      3  58.014599  0.117055
4      4   7.429489  0.637101
25     0  18.495296  0.478585
26     1   8.364020  0.110848
27     2  95.902950  0.756980
28     3   3.198724  0.451460
29     4  73.950932  0.612843

answered Aug 6, 2017 at 11:39

Unatiel

1,0801 gold badge11 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

sato Over a year ago

I'll never learn... you usually never need more than two lines of code when using python. Thanks a lot for your help mate.

Collectives™ on Stack Overflow

Merging certain rows in pandas dataframe

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related