0

I have a dataframe of 2 months of data with 20 columns, one of which is 'date' . There are 3 non-consecutive dates on which there is no data. I want to replicate previous day's data to create entries for those missing days as well.

Here is what I tried:

df_replicate=df[(df['date']=='2021-07-27') | (df['date']=='2021-08-18') | (df['date']=='2021-08-22')]

df_replicate.loc[df_replicate['date']=='2021-07-27']='2021-07-28'
df_replicate.loc[df_replicate['date']=='2021-08-18']='2021-08-19'
df_replicate.loc[df_replicate['date']=='2021-08-22']='2021-08-23'

And then concatenate df and df_replicate

What is an easier way to do this?

0

2 Answers 2

3

You could use reindex with the "ffill" parameter:

import pandas as pd
import numpy as np


date_index = pd.date_range('2021-07-27', periods=7, freq='D')

# set data to date time index
df = pd.DataFrame({"prices": [100, np.nan, 100, 89, 88, np.nan, np.nan ]},index=date_index)

2021-07-27  100.0
2021-07-28    NaN
2021-07-29  100.0
2021-07-30   89.0
2021-07-31   88.0
2021-08-01    NaN
2021-08-02    NaN

# remove one of the date values to represent missing data
df = df[~(df.index=='2021-07-28')]

2021-07-27  100.0
2021-07-29  100.0
2021-07-30   89.0
2021-07-31   88.0
2021-08-01    NaN
2021-08-02    NaN

# Second date index with correct number of days
date_index2 = pd.date_range('2021-07-27', periods=7, freq='D')

# df with missing row foward filled
df.reindex(date_index2, method="ffill")

2021-07-27  100.0 #This value is carried to the next date
2021-07-28  100.0 
2021-07-29  100.0
2021-07-30   89.0
2021-07-31   88.0
2021-08-01    NaN
2021-08-02    NaN

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reindex.html#pandas.DataFrame.reindex

Sign up to request clarification or add additional context in comments.

Comments

1

Filter matched rows by list with next values created by Index.shift and for these pairs use forward filling missing values:

df = pd.DataFrame({"prices": [100, np.nan, 100, 89, 88, np.nan, np.nan ],
                  'date': pd.date_range('2021-07-27', periods=7, freq='D')})


df['date'] = pd.to_datetime(df['date'])

dates = pd.to_datetime(['2021-07-27','2021-08-18','2021-08-22'])

mask = df['date'].isin(dates.append(dates.shift(freq='d')))
df[mask] = df[mask].ffill()
print (df)
   prices       date
0   100.0 2021-07-27
1   100.0 2021-07-28
2   100.0 2021-07-29
3    89.0 2021-07-30
4    88.0 2021-07-31
5     NaN 2021-08-01
6     NaN 2021-08-02

If need replace only next rows (filled by NaNs) by all previous non NaNs:

df['date'] = pd.to_datetime(df['date'])

dates = pd.to_datetime(['2021-07-27','2021-08-18','2021-08-22'])

mask = df['date'].isin(dates.shift(freq='d'))
df[mask] = df.ffill()

If input list is different, next values (['2021-07-28','2021-08-19','2021-08-23']) is necessary shifting for previous matched values:

df['date'] = pd.to_datetime(df['date'])
dates = pd.to_datetime(['2021-07-28','2021-08-19','2021-08-23'])

mask = df['date'].isin(dates.append(dates.shift(-1, freq='d')))
df[mask] = df[mask].ffill()

3 Comments

This did nothing. Could you explain more what this is doing?
the dates are just strings. not datetime
@AmanArora - Answer was edited, can you check if working like you need?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.