I am trying to merge two dataframe together. df2 has more sample points than df. I want to merge them base on the index of df in a way that for each timestamp for the closest non missing value to timestamp be the value.
my original data set is categorical that is why I made the column as strings.
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
import random
##Generate the Data
np.random.seed(12)
date_today = datetime.now()
ndays = 5
df = pd.DataFrame({'date': [date_today + timedelta(days=x) for x in range(ndays)],
'test': pd.Series(np.random.randn(ndays)), 'test2':pd.Series(np.random.randn(ndays))})
df = df.set_index('date').sort_index()
df = df.mask(np.random.random(df.shape) < .7)
print(df)
df2 = pd.DataFrame({'date': [date_today + timedelta(days=(abs(np.random.randn(1))*0.25)[0]*x) for x in range(ndays*2)],
'test3': pd.Series(np.random.randn(ndays*2))})
df2 = df2.set_index('date').sort_index()
df2 = df2.mask(np.random.random(df2.shape) < .3)
df['test']=df['test'].astype(str)
df['test2']=df['test2'].astype(str)
df2['test3']=df2['test3'].astype(str)
print(df2)
df2.reindex(df.index, method='bfill')
current output:
test3
date
2018-03-12 22:31:52.177918 -1.6817565103951275
2018-03-13 22:31:52.177918 nan
2018-03-14 22:31:52.177918 nan
2018-03-15 22:31:52.177918 nan
2018-03-16 22:31:52.177918 nan
Desired out put:
test3
date
2018-03-12 22:31:52.177918 -1.6817565103951275
2018-03-13 22:31:52.177918 0.214975948415751
2018-03-14 22:31:52.177918 nan
2018-03-15 22:31:52.177918 nan
2018-03-16 22:31:52.177918 nan
Thanks in advance,