I am a noob and I have a large CSV file with data structured like this (with a lot more columns):
State daydiff
CT 5.5
CT 6.5
CT 6.25
NY 3.2
NY 3.225
PA 7.522
PA 4.25
I want to output a new CSV where the daydiff is averaged for each State like this:
State daydiff
CT 6.083
NY 3.2125
PA 5.886
I have tried numerous ways and the cleanest seemed to leverage pandas groupby but when i run the code below:
import pandas as pd
df = pd.read_csv('C:...input.csv')
df.groupby('State')['daydiff'].mean()
df.to_csv('C:...AverageOutput.csv')
I get a file that is identical to the original file but with a counter added in the first column with no header:
,State,daydiff
0,CT,5.5
1,CT,6.5
2,CT,6.25
3,NY,3.2
4,NY,3.225
5,PA,7.522
6,PA,4.25
I was also hoping to control the new average in datediff to a decimal going out only to the hundredths. Thanks
df.to_csv('C:...AverageOutput.csv', index=False)`