1

I have a timneseries over a number of days where for each day, I have a variable number of datapoints. A sample dataframe is generated bwlow:

n=10,20
init=datetime.datetime(2016, 7, 24, 0, 0)
df=pd.DataFrame()
for i in np.arange(n[0],n[1]):
    s =init+datetime.timedelta(days=i-10)
    df = pd.concat([df,pd.DataFrame(np.random.rand(i) ,index= pd.date_range(s, periods=i, freq='T') )])

Given a dataframe like the one above, I was to create another dataframe/ndarray which has index= dates from above df (not applicable in case of ndarray). And values(rows) = concatenated data of the previous 2 days (since all rows will have different length using this, we can use "NA" to make them equal)

I tried doing this:

g = df.groupby(pd.TimeGrouper('D'))
d = {k: v for k, v in g}
k=d.keys()
k.sort()
X=pd.DataFrame(index=k)
for i in np.arange(1,len(k)):
    X.ix[i]=pd.concat([d[k[i]],d[k[i-1]]]).ix[:,0]

But this doesn't work.

2
  • Hi @dayum could you explain the concatenation part and the structure of one days data Commented Nov 11, 2016 at 6:55
  • Hi, the first part is just given as a ref to show how my dataframe looks like. It is not otherwise involved in the question. Commented Nov 11, 2016 at 7:11

1 Answer 1

1

Not easy, loops are necessary:

import datetime as datetime
n= 1,5
np.random.seed(1)
init=datetime.datetime(2016, 7, 24, 0, 0)
df=pd.DataFrame()
for i in np.arange(n[0],n[1]):
    s = init+datetime.timedelta(days=int(i)-10)
    df = pd.concat([df,pd.DataFrame({"col": np.random.rand(i)}, 
                                     index= pd.date_range(s, periods=i, freq='T'))])
print (df)    
                          col
2016-07-15 00:00:00  0.417022
2016-07-16 00:00:00  0.720324
2016-07-16 00:01:00  0.000114
2016-07-17 00:00:00  0.302333
2016-07-17 00:01:00  0.146756
2016-07-17 00:02:00  0.092339
2016-07-18 00:00:00  0.186260
2016-07-18 00:01:00  0.345561
2016-07-18 00:02:00  0.396767
2016-07-18 00:03:00  0.538817

Create all unique days by numpy.unique:

u = np.unique(np.array(df.index.values.astype('<M8[D]')))
print (u)
['2016-07-15' '2016-07-16' '2016-07-17' '2016-07-18']

Then create all values by loops to dict d by datetimeindex partial string indexing:

d = {}
for i in u:
    dat = str(i)
    dat1 = str((i - pd.Timedelta('1D')))
    d[i] = pd.Series(df.loc[dat1:dat, 'col'].values)

print (d)
{numpy.datetime64('2016-07-18'): 0    0.302333
1    0.146756
2    0.092339
3    0.186260
4    0.345561
5    0.396767
6    0.538817
dtype: float64, numpy.datetime64('2016-07-15'): 0    0.417022
dtype: float64, numpy.datetime64('2016-07-16'): 0    0.417022
1    0.720324
2    0.000114
dtype: float64, numpy.datetime64('2016-07-17'): 0    0.720324
1    0.000114
2    0.302333
3    0.146756
4    0.092339
dtype: float64}

Last create DataFrame.from_dict:

print (pd.DataFrame.from_dict(d, orient='index'))
                   0         1         2         3         4         5  \
2016-07-15  0.417022       NaN       NaN       NaN       NaN       NaN   
2016-07-16  0.417022  0.720324  0.000114       NaN       NaN       NaN   
2016-07-17  0.720324  0.000114  0.302333  0.146756  0.092339       NaN   
2016-07-18  0.302333  0.146756  0.092339  0.186260  0.345561  0.396767   

                   6  
2016-07-15       NaN  
2016-07-16       NaN  
2016-07-17       NaN  
2016-07-18  0.538817  
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, a bit modified first step.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.