Using groupby apply function iteratively

Question

I've got a working script that returns a df that contains the number of points within a provided radius. Example df below.

currently this applies the function to Label A and returns the other points that are within a specified radius.
what's the most efficient way to pass this function to all unique values in Label iteratively? Instead of passing the function one value at a time.

Code:

import pandas as pd
import numpy as np

df = pd.DataFrame({
        'Time' : ['09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.2','09:00:00.2','09:00:00.2','09:00:00.2','09:00:00.2'],                 
        'Label' : ['A','B','C','D','E','A','B','C','D','E'],                 
        'X' : [8,4,3,8,7,7,3,3,4,6],
        'Y' : [3,3,3,4,3,2,1,2,4,2],
        })

def countPoints(coordinates, ID, radius):
    """Create df that returns coordinates within unique id radius."""

    points = coordinates[['X', 'Y']].values

    array = points[:,None,:] - points[0:,]

    distance = np.linalg.norm(array, axis = 2)

    df = coordinates[distance[coordinates['Label'].eq(ID).values.argmax()] <= radius]

    df['Point'] = ID

    return df

At the moment I'm applying the function to all values in Label separately and then concatenating the df's together. This becomes inefficient if there are numerous unique values in Label.

Is there a way to apply it iteratively.

# Label A
df_A = df.groupby('Time').apply(countPoints, ID = 'A', radius = 1).reset_index(drop = True)

# Label B
df_B = df.groupby('Time').apply(countPoints, ID = 'B', radius = 1).reset_index(drop = True)

# Label C
df_C = df.groupby('Time').apply(countPoints, ID = 'C', radius = 1).reset_index(drop = True)

# Combine df's
df1 = pd.concat([df_A, df_B, df_C]).sort_values(by = 'Time').reset_index(drop = True)

Intended Output:

          Time Label  X  Y Point
0   09:00:00.1     A  8  3     A
1   09:00:00.1     D  8  4     A
2   09:00:00.1     E  7  3     A
3   09:00:00.1     B  4  3     B
4   09:00:00.1     C  3  3     B
5   09:00:00.1     B  4  3     C
6   09:00:00.1     C  3  3     C
7   09:00:00.2     A  7  2     A
8   09:00:00.2     E  6  2     A
9   09:00:00.2     B  3  1     B
10  09:00:00.2     C  3  2     B
11  09:00:00.2     B  3  1     C
12  09:00:00.2     C  3  2     C

Andy L. · Accepted Answer · 2020-01-17 03:39:11Z

Just move pd.concat to inside of the function countPoints as follows

def countPoints(coordinates, radius):  #remove parameter `ID` since applying all IDs
    """Create df that returns coordinates within unique id radius."""

    points = coordinates[['X', 'Y']].values

    array = points[:,None,:] - points[0:,]

    distance = np.linalg.norm(array, axis = 2)

    df = pd.concat([coordinates[m].assign(Point=id) for id, m in 
                            zip(coordinates['Label'], (distance <= radius))], 
                   ignore_index=True)      

    return df


df_out = df.groupby('Time').apply(countPoints, radius = 1).reset_index(drop=True)

Out[175]:
          Time Label  X  Y Point
0   09:00:00.1     A  8  3     A
1   09:00:00.1     D  8  4     A
2   09:00:00.1     E  7  3     A
3   09:00:00.1     B  4  3     B
4   09:00:00.1     C  3  3     B
5   09:00:00.1     B  4  3     C
6   09:00:00.1     C  3  3     C
7   09:00:00.1     A  8  3     D
8   09:00:00.1     D  8  4     D
9   09:00:00.1     A  8  3     E
10  09:00:00.1     E  7  3     E
11  09:00:00.2     A  7  2     A
12  09:00:00.2     E  6  2     A
13  09:00:00.2     B  3  1     B
14  09:00:00.2     C  3  2     B
15  09:00:00.2     B  3  1     C
16  09:00:00.2     C  3  2     C
17  09:00:00.2     D  4  4     D
18  09:00:00.2     A  7  2     E
19  09:00:00.2     E  6  2     E

Above is the output of all IDs, your intended output is for A, B, C. So, just slice df_out to pick only those 3 ID

df_ABC = df_out[df_out.Point.isin(['A', 'B', 'C'])].reset_index(drop=True)

Out[180]:
          Time Label  X  Y Point
0   09:00:00.1     A  8  3     A
1   09:00:00.1     D  8  4     A
2   09:00:00.1     E  7  3     A
3   09:00:00.1     B  4  3     B
4   09:00:00.1     C  3  3     B
5   09:00:00.1     B  4  3     C
6   09:00:00.1     C  3  3     C
7   09:00:00.2     A  7  2     A
8   09:00:00.2     E  6  2     A
9   09:00:00.2     B  3  1     B
10  09:00:00.2     C  3  2     B
11  09:00:00.2     B  3  1     C
12  09:00:00.2     C  3  2     C

ralex · Accepted Answer · 2020-01-16 23:32:58Z

If you append the radius values to the DataFrame -- which should be cheap -- you should be able to eliminate the function application entirely.

import pandas as pd
import numpy as np

df = pd.DataFrame({
        'Time' : ['09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.2','09:00:00.2','09:00:00.2','09:00:00.2','09:00:00.2'],                 
        'Label' : ['A','B','C','D','E','A','B','C','D','E'],                 
        'X' : [8,4,3,8,7,7,3,3,4,6],
        'Y' : [3,3,3,4,3,2,1,2,4,2],
        })

# make the radii explicit
df.loc[:, 'norm2'] = np.linalg.norm(df.loc[:, ['X', 'Y']].values, axis=1)
# 517 µs ± 4.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# with radii appended
In [1]: df
Out[1]:
         Time Label  X  Y     norm2
0  09:00:00.1     A  8  3  8.544004
1  09:00:00.1     B  4  3  5.000000
2  09:00:00.1     C  3  3  4.242641
3  09:00:00.1     D  8  4  8.944272
4  09:00:00.1     E  7  3  7.615773
5  09:00:00.2     A  7  2  7.280110
6  09:00:00.2     B  3  1  3.162278
7  09:00:00.2     C  3  2  3.605551
8  09:00:00.2     D  4  4  5.656854
9  09:00:00.2     E  6  2  6.324555


# indexing the DataFrame before counting with `groupby`

In [2]: df[df['norm2'] < 4].groupby(['Time', 'Label'])['norm2'].count()
Out[2]:
Time        Label
09:00:00.2  B        1
            C        1
Name: norm2, dtype: int64

Sorry @ralex. I've included more info in this question. Is this clearer

Collectives™ on Stack Overflow

Using groupby apply function iteratively

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related