add all missing dates in pandas

Question

I've the following data. How can I add all dates (from 1st to the end of the month) ? also how can I remove saturdays and sundays from this dataset?

Date        values
31/03/14    -0.0123
30/04/14    0.11168
30/06/14    0.0997
31/07/14    0.007
30/09/14    0.886



Date    values
1/3/14
2/3/14
.....
..
31/3/14
1/4/14
2/4/14
....
.....
30/09/14

@MaxU I've added the required dataset. Basically, I want all dates first then remove all saturdays and sundays if posiibles and then fill ffil/bfill in values. Please let me know if this is possible? — jason
– jason, Commented Mar 31, 2018 at 22:45
should your desired data set include data for May and August or not? — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Mar 31, 2018 at 22:56

Community · Accepted Answer · 2020-06-20 09:12:55Z

2

Assuming you can reload your dataset from a csv

import pandas as pd

data = '''\
Date        values
31/03/14    -0.0123
30/04/14    0.11168
30/06/14    0.0997
31/07/14    0.007
30/09/14    0.886'''

# This operation includes reading the dataset, converting Date to Datetime and
# setting Date as index
df = pd.read_csv(pd.compat.StringIO(data),sep='\s+',parse_dates=['Date'],index_col='Date')

# Resample day
df = df.resample('D').sum()  # or first() or mean() 

# Remove weekdays smaller than 5 (saturday and sunday) and reset
df = df.loc[df.index.weekday < 5].reset_index()

print(df.head())

And you get (printing first 5 rows):

        Date  values
0 2014-03-31 -0.0123
1 2014-04-01     NaN
2 2014-04-02     NaN
3 2014-04-03     NaN
4 2014-04-04     NaN

Assuming you already loaded your dataset

The equivalent assuming you already loaded your dataset (compact). I also added not May or August mask here if you want to exclude those months.

df = df.set_index(pd.to_datetime(df.Date)).drop('Date', axis = 1)
df = df.resample('D').first()
m1 = df.index.weekday < 5          # mask1 (no sat/sun)
m2 = ~df.index.month.isin([5,8])   # mask2 (not May or August)
df = df.loc[m1 & m2].reset_index()

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Mar 31, 2018 at 22:46

Anton vBR

19k6 gold badges47 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

MaxU - stand with Ukraine Over a year ago

If i understood OP correctly the desired DF should not contain data from May and August...

Anton vBR Over a year ago

@MaxU Yep, it wasn't clear. Maybe OP can clarify later. (added this in the end)

jason Over a year ago

@MaxU I really appreciate your help! Actually, I don't want saturdays and sundays for the entire dataset not just May and August.

jason Over a year ago

AWESOME! THis works :) Also, how can I fill the missing values with bffil and ffill?

Anton vBR Over a year ago

@jason Simply add df.fillna(method='ffill', inplace=True) or bfill and you got a na-fill too.

BENY · Accepted Answer · 2018-04-01 01:01:32Z

1

You can using date_range

df.Date=pd.to_datetime(df.Date)
s=pd.DataFrame({'Date':sum([pd.date_range(x,y,freq='D').tolist() for x,y in zip(pd.to_datetime(df.Date.dt.strftime('%Y-%m')),df.Date)],[])})

s=s.merge(df)
s=s[s.Date.dt.weekday<5]

edited Apr 1, 2018 at 1:01

answered Mar 31, 2018 at 22:46

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

add all missing dates in pandas

2 Answers 2

Assuming you can reload your dataset from a csv

Assuming you already loaded your dataset

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Assuming you can reload your dataset from a csv

Assuming you already loaded your dataset

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related