0

I have this dataframe, consisting in 73 rows:

Date    Col1    Col2   Col3
1975   float   float  float
1976   float   float  float
1976   float   float  float
1977   float   float  float
1978   float   float  float
....
....

There are certain years appearing twice because the values were taken twice that year. What I want to do is to merge those rows where the year is the same, taking the mean value of each column for those specific two rows. The fact is that I am still familiarizing with pandas and I don't really understand the usage of the loc and iloc selectors. This is what I have tried, but I am sure this is completely wrong and non-pythonic:

for i in range(72):
    if df.Date[i]==df.Date[i+1]:
        df.Very_satisfied[i]= (df.Very_satisfied[i]+df.Very_satisfied[i+1])/2
        df.Fairly_satisfied[i]= (df.Fairly_satisfied[i]+df.Fairly_satisfied[i+1])/2
        df.NV_satisfied[i]= (df.NV_satisfied[i]+ df.NV_satisfied[i+1])/2
        df.Not_satisfied[i]= (df.Not_satisfied[i]+ df.Not_satisfied[i+1])/2
        df.DK[i]= (df.DK[i]+ df.DK[i+1])/2
        a=i+1
        str(a)
        df.drop(a)

where "very satisfied", "fairly satisfied" ecc. are the columns. The point in my code is: if two years are the same calculate the mean of each value, substitute it in the first row and delete the second row. I really need something smarter and more elegant.

1 Answer 1

1

You can use groupby() and then mean() for this. Here is an example :

import pandas as pd
import numpy as np

df = pd.DataFrame({'date': list(range(25)) * 2, 'col1': np.random.random(50) * 100, 'col2': np.random.random(50)})
df.groupby('date').mean()

This will take all the rows which the same date, calculate the mean value of all the rows in the group for each column.

This outputs on my sample :

df.groupby('date').mean().head()
           col1      col2
date
0     42.881950  0.436073
1     32.114299  0.309742
2     96.819446  0.809071
3     30.606661  0.284257
4     40.690211  0.624972

For this input :

df[df['date'] < 5]

    date       col1      col2
0      0  67.268605  0.393560
1      1  55.864578  0.508636
2      2  97.735942  0.861162
3      3  58.014599  0.117055
4      4   7.429489  0.637101
25     0  18.495296  0.478585
26     1   8.364020  0.110848
27     2  95.902950  0.756980
28     3   3.198724  0.451460
29     4  73.950932  0.612843
Sign up to request clarification or add additional context in comments.

1 Comment

I'll never learn... you usually never need more than two lines of code when using python. Thanks a lot for your help mate.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.