2

I have a file that is formatted like this:

(Year - Month - Day - Data)

1980 - 1 - 1 - 1.2
1980 - 1 - 2 - 1.3
1980 - 1 - 3 - 1.4
1980 - 1 - 4 - 1.5
1980 - 1 - 5 - 1.6
1980 - 1 - 6 - 1.7
1980 - 1 - 7 - 1.8

It is in a numpy array. It is data over the course of about 24 years, so what I want to be able to do is take the average per day and put it into a seperate 1D-array that would just be 366 (for leap year) averages, which I could then plot using matplotlib and be able to see the trend over the course of the years. If there anyway to use subsetting in a loop so I could accomplish this?

4
  • 1
    For me at least, it would be better to see a sample of the numpy array. Commented Oct 24, 2017 at 17:55
  • 3
    You should really use pandas for time-series stuff. All of this comes built-in, no need to handle leap-years, for example Commented Oct 24, 2017 at 17:57
  • It's very difficult to disagree with Mr Arrivillaga. Commented Oct 24, 2017 at 18:05
  • Are the elements of the array of type string? Commented Oct 24, 2017 at 18:08

2 Answers 2

4

Using pandas is definitely the way to go. There are at least two ways to group by 'day of the year', you could do either the numeric day of the year as a string or the string monthday combination like so:

import pandas as pd
import numpy as np

df = pd.DataFrame(index=pd.date_range('2000-01-01', '2010-12-31'))

df['vals'] = np.random.randint(1, 6, df.shape[0])

print(df.groupby(df.index.strftime("%j")).mean())
print(df.groupby(df.index.strftime("%m%d")).mean())
Sign up to request clarification or add additional context in comments.

Comments

1

For anyone coming to this question hoping to find an alternative way of processing unusual input here is some code.

In its essentials, the code reads the input file a line at a time, picks out the elements of dates and values, reassembles these into lines that pandas can readily parse and puts them into a StringIO object.

Pandas reads them from there, as if from a csv file. I have cribbed the grouping code from PiRSquared.

import pandas as pd
import re
from io import StringIO

file_name = 'temp.txt'

for_pd = StringIO()
with open(file_name) as f:
    for line in f:
        pieces = re.search(r'([0-9]{4}) - ([0-9]{,2}) - ([0-9]{,2}) - ([0-9.]+)', line).groups()
        pieces = [int(_) for _ in pieces[:3]] + [pieces[3]]
        print ('%.4i-%.2i-%.2i,%s' % tuple(pieces), file=for_pd)
for_pd.seek(0)

df = pd.read_csv(for_pd, header=None, names=['datetimes', 'values'], parse_dates=['datetimes'])

print (df.set_index('datetimes').groupby(pd.TimeGrouper('D')).mean().dropna())
print (df.set_index('datetimes').groupby(pd.TimeGrouper('W')).mean().dropna())

This is the output.

            values
datetimes         
1980-01-01     1.2
1980-01-02     1.3
1980-01-03     1.4
1980-01-04     1.5
1980-01-05     1.6
1980-01-06     1.7
1980-01-07     1.8
            values
datetimes         
1980-01-06    1.45
1980-01-13    1.80

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.