How could I get a daily average in python?

Question

I have a file that is formatted like this:

(Year - Month - Day - Data)

1980 - 1 - 1 - 1.2
1980 - 1 - 2 - 1.3
1980 - 1 - 3 - 1.4
1980 - 1 - 4 - 1.5
1980 - 1 - 5 - 1.6
1980 - 1 - 6 - 1.7
1980 - 1 - 7 - 1.8

It is in a numpy array. It is data over the course of about 24 years, so what I want to be able to do is take the average per day and put it into a seperate 1D-array that would just be 366 (for leap year) averages, which I could then plot using matplotlib and be able to see the trend over the course of the years. If there anyway to use subsetting in a loop so I could accomplish this?

For me at least, it would be better to see a sample of the numpy array. — Bill Bell
– Bill Bell, Commented Oct 24, 2017 at 17:55
You should really use pandas for time-series stuff. All of this comes built-in, no need to handle leap-years, for example — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Oct 24, 2017 at 17:57

daryl · Accepted Answer · 2017-10-24 18:16:28Z

4

Using pandas is definitely the way to go. There are at least two ways to group by 'day of the year', you could do either the numeric day of the year as a string or the string monthday combination like so:

import pandas as pd
import numpy as np

df = pd.DataFrame(index=pd.date_range('2000-01-01', '2010-12-31'))

df['vals'] = np.random.randint(1, 6, df.shape[0])

print(df.groupby(df.index.strftime("%j")).mean())
print(df.groupby(df.index.strftime("%m%d")).mean())

answered Oct 24, 2017 at 18:16

daryl

1,20011 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Bill Bell · Accepted Answer · 2017-10-25 20:23:22Z

For anyone coming to this question hoping to find an alternative way of processing unusual input here is some code.

In its essentials, the code reads the input file a line at a time, picks out the elements of dates and values, reassembles these into lines that pandas can readily parse and puts them into a StringIO object.

Pandas reads them from there, as if from a csv file. I have cribbed the grouping code from PiRSquared.

import pandas as pd
import re
from io import StringIO

file_name = 'temp.txt'

for_pd = StringIO()
with open(file_name) as f:
    for line in f:
        pieces = re.search(r'([0-9]{4}) - ([0-9]{,2}) - ([0-9]{,2}) - ([0-9.]+)', line).groups()
        pieces = [int(_) for _ in pieces[:3]] + [pieces[3]]
        print ('%.4i-%.2i-%.2i,%s' % tuple(pieces), file=for_pd)
for_pd.seek(0)

df = pd.read_csv(for_pd, header=None, names=['datetimes', 'values'], parse_dates=['datetimes'])

print (df.set_index('datetimes').groupby(pd.TimeGrouper('D')).mean().dropna())
print (df.set_index('datetimes').groupby(pd.TimeGrouper('W')).mean().dropna())

This is the output.

            values
datetimes         
1980-01-01     1.2
1980-01-02     1.3
1980-01-03     1.4
1980-01-04     1.5
1980-01-05     1.6
1980-01-06     1.7
1980-01-07     1.8
            values
datetimes         
1980-01-06    1.45
1980-01-13    1.80

Collectives™ on Stack Overflow

How could I get a daily average in python?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related