0

I'm currently working in removing some 1D arrays based on the values of one of the columns from a 2D array. The first column may have different and repeated values, I want to keep one of each repeated value based on the max value of the second column (this is just an example, the 2d array may be bigger) here is what I tried

import numpy as np

arr = np.array([[ 36.06, 209.14],
                [ 36.06, 214.55],
                [ 36.06, 215.91],
                [ 36.06, 225.29],
                [ 41.11, 186.76],
                [ 41.11, 191.79],
                [ 41.11, 197.21],
                [ 41.11, 197.33],
                [ 41.11, 201.19],
                [ 41.11, 206.15],
                [ 50.25, 165.51],
                [ 50.25, 174.32],
                [ 59.03, 148.79]])     

biggest = 0
aux = []
for i in range(arr.shape[0]-1):
    j = i+1
    if (arr[i][0] == arr[j][0]):
        if (arr[i][1] < arr[j][1] and arr[j][1] > biggest):
            biggest = j
    if (arr[i][0] != arr[j][0]):
        aux.append(arr[biggest])

print(np.array(aux))

#Output = [[ 36.06 225.29]
#          [ 41.11 206.15]
#          [ 50.25 174.32]]

As you can see, I get almost the desired result, my expected result should be something like this...

Output = [[ 36.06 225.29]
          [ 41.11 206.15]
          [ 50.25 174.32]
          [ 59.03 148.79]]

The thing is I'm missing the last array and maybe there is an easier way using numpy built-in functions that I'm missing. Thank you in advance!

0

2 Answers 2

1

No reason to reinvent the wheel. Just use pandas.

import pandas as pd

pd.DataFrame(arr).groupby(0, as_index=False).max().to_numpy()

>> array([[ 36.06, 225.29],
          [ 41.11, 206.15],
          [ 50.25, 174.32],
          [ 59.03, 148.79]])

Alternative

The input seems sorted in both columns, meaning the highest value per key is always the last. If that is the case, or if it can be accomplished by sorting, a plain numpy version is also possible.

# if not already sorted, sort as described above
sorted_array = arr[np.lexsort((arr[:, 1], arr[:, 0]))]
# find the last value per key
keys = sorted_array[:, 0]
ends = np.append(keys[1:] != keys[:-1], True)
# extract rows
return sorted_array[ends]

If we include the cost of sorting, this has a higher computational complexity than the pandas version (assuming the pandas version uses hash tables; haven't checked) Shape of the data and quality of the implementation may change actual runtime.

Sign up to request clarification or add additional context in comments.

2 Comments

AIK, Pandas groupby will be better than numpy in such problems if the OP have not any problem using that.
@Ali_Sh Added a numpy alternative for good measure ;-)
0

one way is to apply np.unique on the first column to find the unique values in that column (note np.unique will get unique values in sorted scheme in default which is working on your example), then check the maximum value index in the second column for each of that unique values and append to your list:

aux = []
for i in np.unique(arr[:, 0]):
    arr_ = arr[arr[:, 0] == i]
    aux.append(arr_[arr_[:, 1].argmax()])

or using arrays instead list appending:

uniques_ = np.unique(arr[:, 0])
# [36.06 41.11 50.25 59.03]

result = np.empty((uniques_.shape[0], arr.shape[1]))
for i, j in enumerate(uniques_):
    arr_ = arr[arr[:, 0] == j]
    result[i] = arr_[arr_[:, 1].argmax()]

# result
# [[ 36.06 225.29]
#  [ 41.11 206.15]
#  [ 50.25 174.32]
#  [ 59.03 148.79]]

to preserve orderings of the first column using np.unique, if we have:

arr = np.array([[ 41.11, 186.76],
                [ 41.11, 191.79],
                [ 41.11, 197.21],
                [ 41.11, 197.33],
                [ 41.11, 201.19],
                [ 41.11, 206.15],
                [ 36.06, 209.14],
                [ 36.06, 214.55],
                [ 36.06, 215.91],
                [ 36.06, 225.29],
                [ 50.25, 165.51],
                [ 50.25, 174.32],
                [ 59.03, 148.79]])

_, idx = np.unique(arr[:, 0], return_index=True)
uniques_ = arr[:, 0][np.sort(idx)]
result = np.empty((uniques_.shape[0], arr.shape[1]))
for i, j in enumerate(uniques_):
    arr_ = arr[arr[:, 0] == j]
    result[i] = arr_[arr_[:, 1].argmax()]

# result
# [[ 41.11 206.15]
#  [ 36.06 225.29]
#  [ 50.25 174.32]
#  [ 59.03 148.79]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.