2

I am looking for an efficient way to do the following:

If my input is:

np.array([9,0,1,0,3,0])

I want my output to be:

np.array([0,3,2,3,1,3]) # 9 is the highest, so it gets rank 0
                        # 3 is the second highest, so it gets rank 1
                        # 1 is third highest, so it gets rank 2
                        # 0's are forth highest so they get rank 3

I am trying to apply the following to 2D matrix:

Input:

a = np.array([[9,0,1,0,3,0],
              [0,1,2,3,4,5],
              [0.01,0.3,2,100,1,1],
              [0,0,0,0,1,1],
              [4,4,4,4,4,4]])

Output:

>>> get_order_array(a)
array([[0, 3, 2, 3, 1, 3],
       [5, 4, 3, 2, 1, 0],
       [4, 3, 1, 0, 2, 2],
       [1, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 0]])

I do can achieve the above with the following solution; however, I feel that its is very inefficient, so I was hoping that someone can suggest a better way to achieve my goal.

def get_order(x):
    unique_x = np.unique(x)
    step_1 = np.argsort(unique_x)[::-1]
    temp_dict = dict(zip(unique_x, step_1))
    return np.vectorize(temp_dict.get)(x)

def get_order_array(x):
    new_array = np.empty(x.shape, dtype=np.int)
    for i in xrange(x.shape[0]):
        new_array[i] = get_order(x[i])
    return new_array

3 Answers 3

1

@Jaime's answer is great (as usual!). Here's an alternative, using scipy.stats.rankdata.

In rankdata's terminology, you want a "dense" ranking. You also want to rank the values in the opposite order than usual. To accomplish the reverse order, we'll pass -a to rankdata. We'll also subtract 1 from the ranking so the ranks begin at 0 instead of 1. Finally, you want to rank the rows of a two-dimensional array. rankdata works on one-dimensional data, so we'll have to loop over the rows.

Here's the code:

import numpy as np
from scipy.stats import rankdata


def get_order_array(a):
    b = np.empty(a.shape, dtype=int)
    for k, row in enumerate(a):
        b[k] = rankdata(-row, method='dense') - 1
    return b


if __name__ == "__main__":    
    a = np.array([[9,0,1,0,3,0],
                  [0,1,2,3,4,5],
                  [0.01,0.3,2,100,1,1],
                  [0,0,0,0,1,1],
                  [4,4,4,4,4,4]])
    print get_order_array(a)

Output:

[[0 3 2 3 1 3]
 [5 4 3 2 1 0]
 [4 3 1 0 2 2]
 [1 1 1 1 0 0]
 [0 0 0 0 0 0]]
Sign up to request clarification or add additional context in comments.

1 Comment

This is about twice as fast as @Jamie's solution, and using a library function makes the code simpler. Thank You.
1

A little cumsum magic goes a long way:

a_idx = np.argsort(a, axis=-1)[:, ::-1]
a_sorted = a[np.arange(a.shape[0])[:, None], a_idx]
a_diff = np.zeros_like(a_sorted, dtype=np.bool)
a_diff[:, 1:] = a_sorted[:, :-1] != a_sorted[:, 1:]
a_sorted_ranks = np.cumsum(a_diff, axis=1)
a_ranks = a_sorted_ranks[np.arange(a_sorted_ranks.shape[0])[:, None],
                         np.argsort(a_idx, axis=1)]
>>> a_ranks
array([[0, 3, 2, 3, 1, 3],
       [5, 4, 3, 2, 1, 0],
       [4, 3, 1, 0, 2, 2],
       [1, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 0]])

1 Comment

Thank You, this is certainly better than mine.
0

Basically:

order = a.argsort(axis=1)
ranks = order.argsort(axis=1)

And no, I did not come up with this clever answer myself. See:

Rank items in an array using Python/NumPy

There you also find a recipe if you want to have same rank for same numbers. (This one gives consecutive ranks, if there are duplicate numbers.)

2 Comments

Maybe I missing something, but I don't see any solution there that answers my question. It is the duplicate values that make it hard.
@Akavali You are right, I did not read it carefully enough. Jaime's solution above is probably close to the best possible.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.