2

I have a pandas dataframe with dates and locations:

df1 = pd.DataFrame({'dates':['1-1-2013', '1-2-2013', 
      '1-3-2013'], 'locations':['L1','L2','L3']}) 

and another DataFrame that has the counts of points of interest that intersect with each location:

df2 = pd.DataFrame({'dates':['1-1-2013', '1-2-2013', 
      '1-3-2013'], 'locations':['L1','L1','L1'], 'poi_cts':[23,12,23]}) 

The dates in df2 are a small subset of the dates of df1.

I want to create a column in df1 (df1['counts']) which sums the poi_cts for each location/date for poi_cts that are within a specified date range (e.g., within 14 days prior to the date in df1).

I've tried:

def ct_pts(window=14):

    Date = row.Date

    cts = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])

return cts

df1.apply(ct_pts, axis = 1)

but that doesn't work (not sure how to assign the column for each row, I saw this example used somewhere but it isn't working).

I could also do this column-wise, but I'm struggling there too:

def ct_pts():
    new = pd.DataFrame()
    for location in pd.unique(df1['locations']):
        subset = df1[df1['locations']==location]
        for date in pd.unique(df1['Date']):
            df2 = df[df['Date'] == date]
            df2['spray'] = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])
            new = new.append(df2)
    return new

this isn't working either.

I feel like I'm missing something very simple, is there an easy way to do this?

3 Answers 3

1

This might be a bit slower but here's how you can do this using apply:

  1. Create a new column to get start_dates so it's easier to filter:

    df1['dates'] = pd.to_datetime(df1['dates'])
    df1['start_dates'] = df1['dates'] - pd.to_timedelta(14, unit='d')
    
  2. Apply function on entire dataframe:

    def ct_pts(row):
        df_fil = df2[(df2['dates'] <= row['dates']) & (df2['dates'] >= 
                      row['start_dates']) & (df2['locations'] == row['locations'])]
        row['counts'] = sum(df_fil['poi_cts'])
        return row
    
    df1 = df1.apply(ct_pts, axis=1)
    

OUTPUT:

dates       locations   start_dates counts
2013-01-01  L1          2012-12-18  23
2013-01-02  L2          2012-12-19  0
2013-01-03  L3          2012-12-20  0
Sign up to request clarification or add additional context in comments.

Comments

1

I am using numpy boardcast to speed up the whole process

l=[]
for x , y in df1.groupby('locations'):
    s=df2.loc[df2.locations==x,'dates'].values
    t=y['dates'].values
    v=((t[:,None]-s)/np.timedelta64(1, 'D'))
    l.extend(np.dot(((v>-14)&(v<=0)),df2.loc[df2.locations==x,'poi_cts'].values))



df1['cts']=l
df1
Out[167]: 
       dates locations  cts
0 2013-01-01        L1   23
1 2013-02-01        L2    0
2 2013-03-01        L3    0

Comments

0

I got my initial attempt to work using apply:

def num_spray(row):

    Date = row['Date']

    cts = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])

    return cts

df1.apply(ct_pts, axis = 1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.