1

I've got a working script that returns a df that contains the number of points within a provided radius. Example df below.

  • currently this applies the function to Label A and returns the other points that are within a specified radius.
  • what's the most efficient way to pass this function to all unique values in Label iteratively? Instead of passing the function one value at a time.

Code:

import pandas as pd
import numpy as np

df = pd.DataFrame({
        'Time' : ['09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.2','09:00:00.2','09:00:00.2','09:00:00.2','09:00:00.2'],                 
        'Label' : ['A','B','C','D','E','A','B','C','D','E'],                 
        'X' : [8,4,3,8,7,7,3,3,4,6],
        'Y' : [3,3,3,4,3,2,1,2,4,2],
        })

def countPoints(coordinates, ID, radius):
    """Create df that returns coordinates within unique id radius."""

    points = coordinates[['X', 'Y']].values

    array = points[:,None,:] - points[0:,]

    distance = np.linalg.norm(array, axis = 2)

    df = coordinates[distance[coordinates['Label'].eq(ID).values.argmax()] <= radius]

    df['Point'] = ID

    return df

At the moment I'm applying the function to all values in Label separately and then concatenating the df's together. This becomes inefficient if there are numerous unique values in Label.

Is there a way to apply it iteratively.

# Label A
df_A = df.groupby('Time').apply(countPoints, ID = 'A', radius = 1).reset_index(drop = True)

# Label B
df_B = df.groupby('Time').apply(countPoints, ID = 'B', radius = 1).reset_index(drop = True)

# Label C
df_C = df.groupby('Time').apply(countPoints, ID = 'C', radius = 1).reset_index(drop = True)

# Combine df's
df1 = pd.concat([df_A, df_B, df_C]).sort_values(by = 'Time').reset_index(drop = True)

Intended Output:

          Time Label  X  Y Point
0   09:00:00.1     A  8  3     A
1   09:00:00.1     D  8  4     A
2   09:00:00.1     E  7  3     A
3   09:00:00.1     B  4  3     B
4   09:00:00.1     C  3  3     B
5   09:00:00.1     B  4  3     C
6   09:00:00.1     C  3  3     C
7   09:00:00.2     A  7  2     A
8   09:00:00.2     E  6  2     A
9   09:00:00.2     B  3  1     B
10  09:00:00.2     C  3  2     B
11  09:00:00.2     B  3  1     C
12  09:00:00.2     C  3  2     C

2 Answers 2

1

Just move pd.concat to inside of the function countPoints as follows

def countPoints(coordinates, radius):  #remove parameter `ID` since applying all IDs
    """Create df that returns coordinates within unique id radius."""

    points = coordinates[['X', 'Y']].values

    array = points[:,None,:] - points[0:,]

    distance = np.linalg.norm(array, axis = 2)

    df = pd.concat([coordinates[m].assign(Point=id) for id, m in 
                            zip(coordinates['Label'], (distance <= radius))], 
                   ignore_index=True)      

    return df


df_out = df.groupby('Time').apply(countPoints, radius = 1).reset_index(drop=True)

Out[175]:
          Time Label  X  Y Point
0   09:00:00.1     A  8  3     A
1   09:00:00.1     D  8  4     A
2   09:00:00.1     E  7  3     A
3   09:00:00.1     B  4  3     B
4   09:00:00.1     C  3  3     B
5   09:00:00.1     B  4  3     C
6   09:00:00.1     C  3  3     C
7   09:00:00.1     A  8  3     D
8   09:00:00.1     D  8  4     D
9   09:00:00.1     A  8  3     E
10  09:00:00.1     E  7  3     E
11  09:00:00.2     A  7  2     A
12  09:00:00.2     E  6  2     A
13  09:00:00.2     B  3  1     B
14  09:00:00.2     C  3  2     B
15  09:00:00.2     B  3  1     C
16  09:00:00.2     C  3  2     C
17  09:00:00.2     D  4  4     D
18  09:00:00.2     A  7  2     E
19  09:00:00.2     E  6  2     E

Above is the output of all IDs, your intended output is for A, B, C. So, just slice df_out to pick only those 3 ID

df_ABC = df_out[df_out.Point.isin(['A', 'B', 'C'])].reset_index(drop=True)

Out[180]:
          Time Label  X  Y Point
0   09:00:00.1     A  8  3     A
1   09:00:00.1     D  8  4     A
2   09:00:00.1     E  7  3     A
3   09:00:00.1     B  4  3     B
4   09:00:00.1     C  3  3     B
5   09:00:00.1     B  4  3     C
6   09:00:00.1     C  3  3     C
7   09:00:00.2     A  7  2     A
8   09:00:00.2     E  6  2     A
9   09:00:00.2     B  3  1     B
10  09:00:00.2     C  3  2     B
11  09:00:00.2     B  3  1     C
12  09:00:00.2     C  3  2     C
Sign up to request clarification or add additional context in comments.

Comments

1

If you append the radius values to the DataFrame -- which should be cheap -- you should be able to eliminate the function application entirely.

import pandas as pd
import numpy as np

df = pd.DataFrame({
        'Time' : ['09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.2','09:00:00.2','09:00:00.2','09:00:00.2','09:00:00.2'],                 
        'Label' : ['A','B','C','D','E','A','B','C','D','E'],                 
        'X' : [8,4,3,8,7,7,3,3,4,6],
        'Y' : [3,3,3,4,3,2,1,2,4,2],
        })

# make the radii explicit
df.loc[:, 'norm2'] = np.linalg.norm(df.loc[:, ['X', 'Y']].values, axis=1)
# 517 µs ± 4.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# with radii appended
In [1]: df
Out[1]:
         Time Label  X  Y     norm2
0  09:00:00.1     A  8  3  8.544004
1  09:00:00.1     B  4  3  5.000000
2  09:00:00.1     C  3  3  4.242641
3  09:00:00.1     D  8  4  8.944272
4  09:00:00.1     E  7  3  7.615773
5  09:00:00.2     A  7  2  7.280110
6  09:00:00.2     B  3  1  3.162278
7  09:00:00.2     C  3  2  3.605551
8  09:00:00.2     D  4  4  5.656854
9  09:00:00.2     E  6  2  6.324555


# indexing the DataFrame before counting with `groupby`

In [2]: df[df['norm2'] < 4].groupby(['Time', 'Label'])['norm2'].count()
Out[2]:
Time        Label
09:00:00.2  B        1
            C        1
Name: norm2, dtype: int64

1 Comment

Sorry @ralex. I've included more info in this question. Is this clearer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.