1
import pandas as pd
mydate = ["01/01/2018","19/01/2018","24/01/2018" ,
         "27/01/2018","29/01/2018","30/01/2018" , 
         "22/02/2018","23/03/2018"]

mydate = pd.to_datetime(mydate)
events = ["a" , "b" , "c" , "d" , "e" , "f" ,"g" , "h"]

df = pd.DataFrame({"date" :mydate,"events" :events})
df

     date       events
0   2018-01-01  a
1   2018-01-19  b
2   2018-01-24  c
3   2018-01-27  d
4   2018-01-29  e
5   2018-01-30  f
6   2018-02-22  g
7   2018-03-23  h

I want to slice data on every 20 days and store them in separate data frame. I have looked group-by , date_range and other functionality but could not find solution for my problem. I can do this using typical for loop but I am looking to do using some pandas functionality.

Expected result
df = [df1 , df2 , df3 , df4]
where df1 contain row 0 ,1 
      df2 contains row 2,3,4,5
      df3 contain row 6
      df4 contain row 7
2
  • Why are you against using a python loop? I'm not sure, but I have a feeling it's the only way and should be sub-second time unless you're parsing a massive dataframe. Commented May 10, 2018 at 10:54
  • I have a massive dataframe. Feel free to suggest for loop solution if it is memory and time efficient Commented May 10, 2018 at 14:49

3 Answers 3

1

You can use pd.Grouper with freq='20d':

In [8]: final_list = [e for _, e in df.groupby(pd.Grouper(key='date', freq='20d')) if not e.empty]

In [9]: for e in final_list: print(e)
        date events
0 2018-01-01      a
1 2018-01-19      b
        date events
2 2018-01-24      c
3 2018-01-27      d
4 2018-01-29      e
5 2018-01-30      f
        date events
6 2018-02-22      g
        date events
7 2018-03-23      h
Sign up to request clarification or add additional context in comments.

Comments

0

Here's a solution, though it does use a simple loop:

import pandas as pd
from datetime import datetime

df = 'your dataframe'

dfs = []

delta = df.date.max() - df.date.min()

for i in range(0, delta.days+1, 20):
     mask = (df['date'] >= df.date.min()+datetime.timedelta(days=i)) & (df['date'] <= df.date.min() + datetime.timedelta(days=i+20))
     dfs.append(df.loc[mask])

Comments

0

I tried this,

minimum=df['date'].min()
df['diff']=(df['date']-minimum)/datetime.timedelta(days=1)

df['s']=df.groupby(pd.cut(df['diff'],np.arange(-0.000001, df['diff'].max()+20, 20))).grouper.group_info[0]
for u,v in df.groupby('s'):
    del v['s']
    print v

Output

        date events  diff
0 2018-01-01      a   0.0
1 2018-01-19      b  18.0
        date events  diff
2 2018-01-24      c  23.0
3 2018-01-27      d  26.0
4 2018-01-29      e  28.0
5 2018-01-30      f  29.0
        date events  diff
6 2018-02-22      g  52.0
        date events  diff
7 2018-03-23      h  81.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.