1

I have got a large csv file where the sample looks like the following (2 columns and many rows)

date    score
1/1/16  0
2/1/16  0
3/1/16  0.2732
3/1/16  -0.6486
4/1/16  0
5/1/16  0.4404
5/1/16  -0.2732
6/1/16  -0.5859
6/1/16  0.34

You can see that there are multiple same dates with different score in the sample (same as the original file where there are hundreds of same dates with scores). I want to average the score by date and then save it as a csv format. The expected result should look like this (for each date one average score)

date    Avg_Score
1/1/16  0
2/1/16  0
3/1/16  -0.1877
4/1/16  0
5/1/16  0.0836
6/1/16  -0.12295

How can I do it in Pandas module in Python? I checked stackoverflow for suggesstions and loc, iloc and groupby were all I found. But I could not make them useful I guess as this is what I have tried and still gets the same file as my original (nothing changes). Don't know why it is not working and how to get it to work.

import pandas as pd
import csv
df = pd.read_csv('myfile.csv')

df.groupby('date').mean().reset_index()

df.to_csv('average.csv', encoding='utf-8', index=False)

Would appreicate any help as I have been struggling with this for a while. Thank you.

1 Answer 1

2

Assign output back of groupby method to variable, e.g. here df1:

df = pd.read_csv('myfile.csv')
#solution with renamed new column
df1 = df.groupby('date')['score'].mean().reset_index(name='Avg_Score')
#your solution
#df1 = df.groupby('date').mean().reset_index()
df1.to_csv('average.csv', encoding='utf-8', index=False)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.