Python: Average values in a CSV file based on value of another column

Question

I am a noob and I have a large CSV file with data structured like this (with a lot more columns):

State  daydiff
CT     5.5
CT     6.5
CT     6.25
NY     3.2
NY     3.225
PA     7.522
PA     4.25

I want to output a new CSV where the daydiff is averaged for each State like this:

State  daydiff
CT     6.083
NY     3.2125
PA     5.886

I have tried numerous ways and the cleanest seemed to leverage pandas groupby but when i run the code below:

import pandas as pd

df = pd.read_csv('C:...input.csv')
df.groupby('State')['daydiff'].mean()

df.to_csv('C:...AverageOutput.csv')

I get a file that is identical to the original file but with a counter added in the first column with no header:

,State,daydiff
0,CT,5.5
1,CT,6.5
2,CT,6.25
3,NY,3.2
4,NY,3.225
5,PA,7.522
6,PA,4.25

I was also hoping to control the new average in datediff to a decimal going out only to the hundredths. Thanks

@Zero Thanks. That solved the extra column but now the output is identical to the input with no averaging happening. — John Minze
– John Minze, Commented Oct 10, 2017 at 14:37

Arthur Gouveia · Accepted Answer · 2017-10-10 14:51:26Z

1

The "problem" with the counter is because the default behaviour for to_csvis to write the index. You should do df.to_csv('C:...AverageOutput.csv', index=False).

You can control the output format of daydiff by converting it to string. df.daydiff = df.daydiff.apply(lambda x: '{:.2f}'.format(x))

Your complete code should be:

df = pd.read_csv('C:...input.csv')
df2 = df.groupby('State')['daydiff'].mean().apply(lambda x: '{:.2f}'.format(x))
df2.to_csv('C:...AverageOutput.csv')

edited Oct 10, 2017 at 14:51

answered Oct 10, 2017 at 14:31

Arthur Gouveia

7444 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

John Minze Over a year ago

Thanks. That solved the extra column but now the output is identical to the input with no averaging happening.

Arthur Gouveia Over a year ago

Take care of the variable your outputting to_csv. You you output df.to_csv you are writing the original data to the new file. You have to make sure you're using df2.to_csv. I've updated my answer since the groupby will output a Series, not a DataFrame. Since it's a Series you need the index.

John Minze Over a year ago

That worked great thanks, but it did not carry over the headers as I expected.

Collectives™ on Stack Overflow

Python: Average values in a CSV file based on value of another column

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related