6

I'm having problems getting the daily average in a Pandas database. I've checked here Calculating daily average from irregular time series using pandas and it doesn't help. csv files look like this:

Date/Time,Value
12/08/13 12:00:01,5.553
12/08/13 12:30:01,2.604
12/08/13 13:00:01,2.604
12/08/13 13:30:01,2.604
12/08/13 14:00:01,2.101
12/08/13 14:30:01,2.666

and so on. My code looks like this:

# Import iButton temperatures
flistloc = '../data/iButtons/Readings/edit'
flist = os.listdir(flistloc)
# Create empty dictionary to store db for each file
pdib = {}
for file in flist:
    file = os.path.join(flistloc,file)
    # Calls function to return only name
    fname,_,_,_= namer(file)
    # Read each file to db
    pdib[fname] = pd.read_csv(file, parse_dates=0, dayfirst=True, index_col=0)
pdibkeys = sorted(pdib.keys())
#
# Calculate daily average for each iButton
for name in pdibkeys:
    pdib[name]['daily'] = pdib[name].resample('D', how = 'mean')

The database seems ok but the averaging doesn't work. Here is what one looks like in iPython:

'2B5DE4': <class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1601 entries, 2013-08-12 12:00:01 to 2013-09-14 20:00:01
Data columns (total 2 columns):
Value    1601  non-null values
daily    0  non-null values
dtypes: float64(2)}

Anyone know what's going on?

3 Answers 3

9

The question is somewhat old, but i want to contribute anyway since i had to deal with this over and over again (and i think it's not really pythonic...).

The best solution, i have come up so far is to use the original index to create a new dataframe with mostly NA and fill it up at the end.

davg = df.resample('D', how='mean')
davg_NA = davg.loc[df.index]
davg_daily = davg_NA.fillna(method='ffill')

One can even cramp this in one line

df.resample('D', how='mean').loc[df.index].fillna(method='ffill')
Sign up to request clarification or add additional context in comments.

2 Comments

FutureWarning: how in .resample() is deprecated // the new syntax is .resample(...).mean()
What good does the fillna() method do, when you select only the days you already have data for? E.g. (df.resample('D').mean().loc[df.index].fillna(method='ffill') == df.resample('D').mean().loc[df.index]).all() is True when I test it. And why would you be forward filling data into NA fields? Those fields represents days with no data. Their value should be zero (unless you assume your data source should produce a row for every day, rather than for every timestamp where data is produced).
2

You can't resample at a lower frequency and then assign the resampled DataFrame or Series back into the one you resampled from, because the indices don't match:

In [49]: df = pd.read_csv(StringIO("""Date/Time,Value
12/08/13 12:00:01,5.553
12/08/13 12:30:01,2.604
12/08/13 13:00:01,2.604
12/08/13 13:30:01,2.604
12/08/13 14:00:01,2.101
12/08/13 14:30:01,2.666"""), parse_dates=0, dayfirst=True, index_col=0)

In [50]: df.resample('D')
Out[50]:
            Value
Date/Time
2013-08-12  3.022

[1 rows x 1 columns]

In [51]: df['daily'] = df.resample('D')

In [52]: df
Out[52]:
                     Value  daily
Date/Time
2013-08-12 12:00:01  5.553    NaN
2013-08-12 12:30:01  2.604    NaN
2013-08-12 13:00:01  2.604    NaN
2013-08-12 13:30:01  2.604    NaN
2013-08-12 14:00:01  2.101    NaN
2013-08-12 14:30:01  2.666    NaN

[6 rows x 2 columns]

One option is to take advantage of partial time indexing on the rows:

davg = df.resample('D', how='mean')
df.loc[str(davg.index.date[0]), 'daily'] = davg.values

which looks like this, when you expand the str(davg.index.date[0]) line:

df.loc['2013-08-12', 'daily'] = davg.values

This is a bit of hack, there might be a better way to do it.

6 Comments

This is a much more thorough answer than mine, nicely done. I wonder whether @mercergeoinfo actually wants the daily average in the original dataframe, and if so, why.
Also, a less hackish way might be merging the two dataframes together. Something like (1) df['day'] = df.index.apply(lambda x: x.date()) (2) davg.reset_index('Date/Time', inplace=True) and (3) pd.merge(df, davg, left_on='day',right_on='Date/Time')
Nice, you should put that in your answer!
Ah! I keep forgetting that they aren't just dictionaries. I was trying to keep everything together. Why? Who knows?
The answers seems reasonable and is easy enough to work around but I do have a question about this "limitation": it seems that you can assign a single value to a Pandas database, it simply gets repeated at every index. This is fine (and was the cause of my assumption that averaging in the same db would work) but seems inconsistent. Any ideas why pdib['elevation'] = 1140 should work? Not terribly important but curious.
|
2

When you call resample on your 1 column dataframe, the output is going to be a 1 column dataframe with a different index -- with each date as its own index entry. So when you try and assign it to a column in your original dataframe, I don't know what you expect to happen.

Three possible approaches (where df is your original dataframe):

  1. Do you actually need the average values in your original dataframe? If not:

    davg = df.resample('D', how='mean')

  2. If you do, a different solution is to merge the two dataframes on the date, after making sure that both have a column (not the index) with the date.

'

davg = df.resample('D', how='mean')
df['day'] = df.index.apply(lambda x: x.date()) 
davg.reset_index('Date/Time', inplace=True)
df = pandas.merge(df, davg, left_on='day',right_on='Date/Time')
  1. An alternate to 2 (no intuition about whether it's faster) is to simply groupby the date.

    def compute_avg_val(df):
        df['daily average'] = df['Value'].mean()
        return df
    df['day'] = df.index.apply(lambda x: x.date())
    grouped = df.groupby('day')
    df = grouped.apply(compute_avg_val)
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.