Python Pandas daily average

Question

I'm having problems getting the daily average in a Pandas database. I've checked here Calculating daily average from irregular time series using pandas and it doesn't help. csv files look like this:

Date/Time,Value
12/08/13 12:00:01,5.553
12/08/13 12:30:01,2.604
12/08/13 13:00:01,2.604
12/08/13 13:30:01,2.604
12/08/13 14:00:01,2.101
12/08/13 14:30:01,2.666

and so on. My code looks like this:

# Import iButton temperatures
flistloc = '../data/iButtons/Readings/edit'
flist = os.listdir(flistloc)
# Create empty dictionary to store db for each file
pdib = {}
for file in flist:
    file = os.path.join(flistloc,file)
    # Calls function to return only name
    fname,_,_,_= namer(file)
    # Read each file to db
    pdib[fname] = pd.read_csv(file, parse_dates=0, dayfirst=True, index_col=0)
pdibkeys = sorted(pdib.keys())
#
# Calculate daily average for each iButton
for name in pdibkeys:
    pdib[name]['daily'] = pdib[name].resample('D', how = 'mean')

The database seems ok but the averaging doesn't work. Here is what one looks like in iPython:

'2B5DE4': <class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1601 entries, 2013-08-12 12:00:01 to 2013-09-14 20:00:01
Data columns (total 2 columns):
Value    1601  non-null values
daily    0  non-null values
dtypes: float64(2)}

Anyone know what's going on?

Sebastian · Accepted Answer · 2016-10-13 15:53:44Z

9

The question is somewhat old, but i want to contribute anyway since i had to deal with this over and over again (and i think it's not really pythonic...).

The best solution, i have come up so far is to use the original index to create a new dataframe with mostly NA and fill it up at the end.

davg = df.resample('D', how='mean')
davg_NA = davg.loc[df.index]
davg_daily = davg_NA.fillna(method='ffill')

One can even cramp this in one line

df.resample('D', how='mean').loc[df.index].fillna(method='ffill')

answered Oct 13, 2016 at 15:53

Sebastian

911 silver badge1 bronze badge

Sign up to request clarification or add additional context in comments.

2 Comments

Robert Pollak Over a year ago

FutureWarning: how in .resample() is deprecated // the new syntax is .resample(...).mean()

Olsgaard Over a year ago

What good does the fillna() method do, when you select only the days you already have data for? E.g. (df.resample('D').mean().loc[df.index].fillna(method='ffill') == df.resample('D').mean().loc[df.index]).all() is True when I test it. And why would you be forward filling data into NA fields? Those fields represents days with no data. Their value should be zero (unless you assume your data source should produce a row for every day, rather than for every timestamp where data is produced).

Phillip Cloud · Accepted Answer · 2014-04-18 22:25:28Z

2

You can't resample at a lower frequency and then assign the resampled DataFrame or Series back into the one you resampled from, because the indices don't match:

In [49]: df = pd.read_csv(StringIO("""Date/Time,Value
12/08/13 12:00:01,5.553
12/08/13 12:30:01,2.604
12/08/13 13:00:01,2.604
12/08/13 13:30:01,2.604
12/08/13 14:00:01,2.101
12/08/13 14:30:01,2.666"""), parse_dates=0, dayfirst=True, index_col=0)

In [50]: df.resample('D')
Out[50]:
            Value
Date/Time
2013-08-12  3.022

[1 rows x 1 columns]

In [51]: df['daily'] = df.resample('D')

In [52]: df
Out[52]:
                     Value  daily
Date/Time
2013-08-12 12:00:01  5.553    NaN
2013-08-12 12:30:01  2.604    NaN
2013-08-12 13:00:01  2.604    NaN
2013-08-12 13:30:01  2.604    NaN
2013-08-12 14:00:01  2.101    NaN
2013-08-12 14:30:01  2.666    NaN

[6 rows x 2 columns]

One option is to take advantage of partial time indexing on the rows:

davg = df.resample('D', how='mean')
df.loc[str(davg.index.date[0]), 'daily'] = davg.values

which looks like this, when you expand the str(davg.index.date[0]) line:

df.loc['2013-08-12', 'daily'] = davg.values

This is a bit of hack, there might be a better way to do it.

answered Apr 18, 2014 at 22:25

Phillip Cloud

25.8k12 gold badges72 silver badges91 bronze badges

6 Comments

exp1orer Over a year ago

This is a much more thorough answer than mine, nicely done. I wonder whether @mercergeoinfo actually wants the daily average in the original dataframe, and if so, why.

exp1orer Over a year ago

Also, a less hackish way might be merging the two dataframes together. Something like (1) df['day'] = df.index.apply(lambda x: x.date()) (2) davg.reset_index('Date/Time', inplace=True) and (3) pd.merge(df, davg, left_on='day',right_on='Date/Time')

Phillip Cloud Over a year ago

Nice, you should put that in your answer!

mercergeoinfo Over a year ago

Ah! I keep forgetting that they aren't just dictionaries. I was trying to keep everything together. Why? Who knows?

mercergeoinfo Over a year ago

The answers seems reasonable and is easy enough to work around but I do have a question about this "limitation": it seems that you can assign a single value to a Pandas database, it simply gets repeated at every index. This is fine (and was the cause of my assumption that averaging in the same db would work) but seems inconsistent. Any ideas why pdib['elevation'] = 1140 should work? Not terribly important but curious.

|

nick · Accepted Answer · 2018-12-26 00:39:08Z

When you call resample on your 1 column dataframe, the output is going to be a 1 column dataframe with a different index -- with each date as its own index entry. So when you try and assign it to a column in your original dataframe, I don't know what you expect to happen.

Three possible approaches (where df is your original dataframe):

Do you actually need the average values in your original dataframe? If not:

davg = df.resample('D', how='mean')
If you do, a different solution is to merge the two dataframes on the date, after making sure that both have a column (not the index) with the date.

'

davg = df.resample('D', how='mean')
df['day'] = df.index.apply(lambda x: x.date()) 
davg.reset_index('Date/Time', inplace=True)
df = pandas.merge(df, davg, left_on='day',right_on='Date/Time')

An alternate to 2 (no intuition about whether it's faster) is to simply groupby the date.

def compute_avg_val(df):
    df['daily average'] = df['Value'].mean()
    return df
df['day'] = df.index.apply(lambda x: x.date())
grouped = df.groupby('day')
df = grouped.apply(compute_avg_val)

Collectives™ on Stack Overflow

Python Pandas daily average

3 Answers 3

2 Comments

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related