I've got a working script that returns a df that contains the number of points within a provided radius. Example df below.
- currently this applies the function to
LabelAand returns the other points that are within a specified radius. - what's the most efficient way to pass this function to all unique values in
Labeliteratively? Instead of passing the function one value at a time.
Code:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Time' : ['09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.1','09:00:00.2','09:00:00.2','09:00:00.2','09:00:00.2','09:00:00.2'],
'Label' : ['A','B','C','D','E','A','B','C','D','E'],
'X' : [8,4,3,8,7,7,3,3,4,6],
'Y' : [3,3,3,4,3,2,1,2,4,2],
})
def countPoints(coordinates, ID, radius):
"""Create df that returns coordinates within unique id radius."""
points = coordinates[['X', 'Y']].values
array = points[:,None,:] - points[0:,]
distance = np.linalg.norm(array, axis = 2)
df = coordinates[distance[coordinates['Label'].eq(ID).values.argmax()] <= radius]
df['Point'] = ID
return df
At the moment I'm applying the function to all values in Label separately and then concatenating the df's together. This becomes inefficient if there are numerous unique values in Label.
Is there a way to apply it iteratively.
# Label A
df_A = df.groupby('Time').apply(countPoints, ID = 'A', radius = 1).reset_index(drop = True)
# Label B
df_B = df.groupby('Time').apply(countPoints, ID = 'B', radius = 1).reset_index(drop = True)
# Label C
df_C = df.groupby('Time').apply(countPoints, ID = 'C', radius = 1).reset_index(drop = True)
# Combine df's
df1 = pd.concat([df_A, df_B, df_C]).sort_values(by = 'Time').reset_index(drop = True)
Intended Output:
Time Label X Y Point
0 09:00:00.1 A 8 3 A
1 09:00:00.1 D 8 4 A
2 09:00:00.1 E 7 3 A
3 09:00:00.1 B 4 3 B
4 09:00:00.1 C 3 3 B
5 09:00:00.1 B 4 3 C
6 09:00:00.1 C 3 3 C
7 09:00:00.2 A 7 2 A
8 09:00:00.2 E 6 2 A
9 09:00:00.2 B 3 1 B
10 09:00:00.2 C 3 2 B
11 09:00:00.2 B 3 1 C
12 09:00:00.2 C 3 2 C