Finding the 1D numpy array based on the max value in the second column of a 2D numpy array

Question

I'm currently working in removing some 1D arrays based on the values of one of the columns from a 2D array. The first column may have different and repeated values, I want to keep one of each repeated value based on the max value of the second column (this is just an example, the 2d array may be bigger) here is what I tried

import numpy as np

arr = np.array([[ 36.06, 209.14],
                [ 36.06, 214.55],
                [ 36.06, 215.91],
                [ 36.06, 225.29],
                [ 41.11, 186.76],
                [ 41.11, 191.79],
                [ 41.11, 197.21],
                [ 41.11, 197.33],
                [ 41.11, 201.19],
                [ 41.11, 206.15],
                [ 50.25, 165.51],
                [ 50.25, 174.32],
                [ 59.03, 148.79]])     

biggest = 0
aux = []
for i in range(arr.shape[0]-1):
    j = i+1
    if (arr[i][0] == arr[j][0]):
        if (arr[i][1] < arr[j][1] and arr[j][1] > biggest):
            biggest = j
    if (arr[i][0] != arr[j][0]):
        aux.append(arr[biggest])

print(np.array(aux))

#Output = [[ 36.06 225.29]
#          [ 41.11 206.15]
#          [ 50.25 174.32]]

As you can see, I get almost the desired result, my expected result should be something like this...

Output = [[ 36.06 225.29]
          [ 41.11 206.15]
          [ 50.25 174.32]
          [ 59.03 148.79]]

The thing is I'm missing the last array and maybe there is an easier way using numpy built-in functions that I'm missing. Thank you in advance!

Homer512 · Accepted Answer · 2022-07-16 19:56:39Z

1

No reason to reinvent the wheel. Just use pandas.

import pandas as pd

pd.DataFrame(arr).groupby(0, as_index=False).max().to_numpy()

>> array([[ 36.06, 225.29],
          [ 41.11, 206.15],
          [ 50.25, 174.32],
          [ 59.03, 148.79]])

Alternative

The input seems sorted in both columns, meaning the highest value per key is always the last. If that is the case, or if it can be accomplished by sorting, a plain numpy version is also possible.

# if not already sorted, sort as described above
sorted_array = arr[np.lexsort((arr[:, 1], arr[:, 0]))]
# find the last value per key
keys = sorted_array[:, 0]
ends = np.append(keys[1:] != keys[:-1], True)
# extract rows
return sorted_array[ends]

If we include the cost of sorting, this has a higher computational complexity than the pandas version (assuming the pandas version uses hash tables; haven't checked) Shape of the data and quality of the implementation may change actual runtime.

edited Jul 16, 2022 at 19:56

answered Jul 16, 2022 at 18:46

Homer512

15.1k2 gold badges16 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ali_Sh Over a year ago

AIK, Pandas groupby will be better than numpy in such problems if the OP have not any problem using that.

Homer512 Over a year ago

@Ali_Sh Added a numpy alternative for good measure ;-)

Ali_Sh · Accepted Answer · 2022-07-16 18:57:27Z

one way is to apply np.unique on the first column to find the unique values in that column (note np.unique will get unique values in sorted scheme in default which is working on your example), then check the maximum value index in the second column for each of that unique values and append to your list:

aux = []
for i in np.unique(arr[:, 0]):
    arr_ = arr[arr[:, 0] == i]
    aux.append(arr_[arr_[:, 1].argmax()])

or using arrays instead list appending:

uniques_ = np.unique(arr[:, 0])
# [36.06 41.11 50.25 59.03]

result = np.empty((uniques_.shape[0], arr.shape[1]))
for i, j in enumerate(uniques_):
    arr_ = arr[arr[:, 0] == j]
    result[i] = arr_[arr_[:, 1].argmax()]

# result
# [[ 36.06 225.29]
#  [ 41.11 206.15]
#  [ 50.25 174.32]
#  [ 59.03 148.79]]

to preserve orderings of the first column using np.unique, if we have:

arr = np.array([[ 41.11, 186.76],
                [ 41.11, 191.79],
                [ 41.11, 197.21],
                [ 41.11, 197.33],
                [ 41.11, 201.19],
                [ 41.11, 206.15],
                [ 36.06, 209.14],
                [ 36.06, 214.55],
                [ 36.06, 215.91],
                [ 36.06, 225.29],
                [ 50.25, 165.51],
                [ 50.25, 174.32],
                [ 59.03, 148.79]])

_, idx = np.unique(arr[:, 0], return_index=True)
uniques_ = arr[:, 0][np.sort(idx)]
result = np.empty((uniques_.shape[0], arr.shape[1]))
for i, j in enumerate(uniques_):
    arr_ = arr[arr[:, 0] == j]
    result[i] = arr_[arr_[:, 1].argmax()]

# result
# [[ 41.11 206.15]
#  [ 36.06 225.29]
#  [ 50.25 174.32]
#  [ 59.03 148.79]]

Collectives™ on Stack Overflow

Finding the 1D numpy array based on the max value in the second column of a 2D numpy array

2 Answers 2

Alternative

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Alternative

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related