Split Single Pandas DataFrame into N DataFrame in Python using Time Series Data

Question

import pandas as pd
mydate = ["01/01/2018","19/01/2018","24/01/2018" ,
         "27/01/2018","29/01/2018","30/01/2018" , 
         "22/02/2018","23/03/2018"]

mydate = pd.to_datetime(mydate)
events = ["a" , "b" , "c" , "d" , "e" , "f" ,"g" , "h"]

df = pd.DataFrame({"date" :mydate,"events" :events})
df

     date       events
0   2018-01-01  a
1   2018-01-19  b
2   2018-01-24  c
3   2018-01-27  d
4   2018-01-29  e
5   2018-01-30  f
6   2018-02-22  g
7   2018-03-23  h

I want to slice data on every 20 days and store them in separate data frame. I have looked group-by , date_range and other functionality but could not find solution for my problem. I can do this using typical for loop but I am looking to do using some pandas functionality.

Expected result
df = [df1 , df2 , df3 , df4]
where df1 contain row 0 ,1 
      df2 contains row 2,3,4,5
      df3 contain row 6
      df4 contain row 7

Why are you against using a python loop? I'm not sure, but I have a feeling it's the only way and should be sub-second time unless you're parsing a massive dataframe. — elPastor
– elPastor, Commented May 10, 2018 at 10:54
I have a massive dataframe. Feel free to suggest for loop solution if it is memory and time efficient — Viral
– Viral, Commented May 10, 2018 at 14:49

llllllllll · Accepted Answer · 2018-05-22 11:24:17Z

1

You can use pd.Grouper with freq='20d':

In [8]: final_list = [e for _, e in df.groupby(pd.Grouper(key='date', freq='20d')) if not e.empty]

In [9]: for e in final_list: print(e)
        date events
0 2018-01-01      a
1 2018-01-19      b
        date events
2 2018-01-24      c
3 2018-01-27      d
4 2018-01-29      e
5 2018-01-30      f
        date events
6 2018-02-22      g
        date events
7 2018-03-23      h

edited May 22, 2018 at 11:24

answered May 10, 2018 at 11:32

llllllllll

16.5k4 gold badges35 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ashish Acharya · Accepted Answer · 2018-05-10 11:09:19Z

0

Here's a solution, though it does use a simple loop:

import pandas as pd
from datetime import datetime

df = 'your dataframe'

dfs = []

delta = df.date.max() - df.date.min()

for i in range(0, delta.days+1, 20):
     mask = (df['date'] >= df.date.min()+datetime.timedelta(days=i)) & (df['date'] <= df.date.min() + datetime.timedelta(days=i+20))
     dfs.append(df.loc[mask])

answered May 10, 2018 at 11:09

Ashish Acharya

3,4091 gold badge19 silver badges25 bronze badges

Comments

Mohamed Thasin ah · Accepted Answer · 2018-05-10 11:37:58Z

0

I tried this,

minimum=df['date'].min()
df['diff']=(df['date']-minimum)/datetime.timedelta(days=1)

df['s']=df.groupby(pd.cut(df['diff'],np.arange(-0.000001, df['diff'].max()+20, 20))).grouper.group_info[0]
for u,v in df.groupby('s'):
    del v['s']
    print v

Output

        date events  diff
0 2018-01-01      a   0.0
1 2018-01-19      b  18.0
        date events  diff
2 2018-01-24      c  23.0
3 2018-01-27      d  26.0
4 2018-01-29      e  28.0
5 2018-01-30      f  29.0
        date events  diff
6 2018-02-22      g  52.0
        date events  diff
7 2018-03-23      h  81.0

edited May 10, 2018 at 11:37

answered May 10, 2018 at 11:21

Mohamed Thasin ah

11.2k11 gold badges65 silver badges120 bronze badges

Collectives™ on Stack Overflow

Split Single Pandas DataFrame into N DataFrame in Python using Time Series Data

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related