2

I've the following data. How can I add all dates (from 1st to the end of the month) ? also how can I remove saturdays and sundays from this dataset?

Date        values
31/03/14    -0.0123
30/04/14    0.11168
30/06/14    0.0997
31/07/14    0.007
30/09/14    0.886



Date    values
1/3/14
2/3/14
.....
..
31/3/14
1/4/14
2/4/14
....
.....
30/09/14
3
  • please post your desired data set Commented Mar 31, 2018 at 22:34
  • @MaxU I've added the required dataset. Basically, I want all dates first then remove all saturdays and sundays if posiibles and then fill ffil/bfill in values. Please let me know if this is possible? Commented Mar 31, 2018 at 22:45
  • should your desired data set include data for May and August or not? Commented Mar 31, 2018 at 22:56

2 Answers 2

2

Assuming you can reload your dataset from a csv

import pandas as pd

data = '''\
Date        values
31/03/14    -0.0123
30/04/14    0.11168
30/06/14    0.0997
31/07/14    0.007
30/09/14    0.886'''

# This operation includes reading the dataset, converting Date to Datetime and
# setting Date as index
df = pd.read_csv(pd.compat.StringIO(data),sep='\s+',parse_dates=['Date'],index_col='Date')

# Resample day
df = df.resample('D').sum()  # or first() or mean() 

# Remove weekdays smaller than 5 (saturday and sunday) and reset
df = df.loc[df.index.weekday < 5].reset_index()

print(df.head())

And you get (printing first 5 rows):

        Date  values
0 2014-03-31 -0.0123
1 2014-04-01     NaN
2 2014-04-02     NaN
3 2014-04-03     NaN
4 2014-04-04     NaN

Assuming you already loaded your dataset

The equivalent assuming you already loaded your dataset (compact). I also added not May or August mask here if you want to exclude those months.

df = df.set_index(pd.to_datetime(df.Date)).drop('Date', axis = 1)
df = df.resample('D').first()
m1 = df.index.weekday < 5          # mask1 (no sat/sun)
m2 = ~df.index.month.isin([5,8])   # mask2 (not May or August)
df = df.loc[m1 & m2].reset_index() 
Sign up to request clarification or add additional context in comments.

5 Comments

If i understood OP correctly the desired DF should not contain data from May and August...
@MaxU Yep, it wasn't clear. Maybe OP can clarify later. (added this in the end)
@MaxU I really appreciate your help! Actually, I don't want saturdays and sundays for the entire dataset not just May and August.
AWESOME! THis works :) Also, how can I fill the missing values with bffil and ffill?
@jason Simply add df.fillna(method='ffill', inplace=True) or bfill and you got a na-fill too.
1

You can using date_range

df.Date=pd.to_datetime(df.Date)
s=pd.DataFrame({'Date':sum([pd.date_range(x,y,freq='D').tolist() for x,y in zip(pd.to_datetime(df.Date.dt.strftime('%Y-%m')),df.Date)],[])})

s=s.merge(df)
s=s[s.Date.dt.weekday<5]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.