Numpy: Fastest way to insert value into array such that array's in order

Question

Suppose I have an array my_array and a singular value my_val. (Note that my_array is always sorted).

my_array = np.array([1, 2, 3, 4, 5])
my_val = 1.5

Because my_val is 1.5, I want to put it in between 1 and 2, giving me the array [1, 1.5, 2, 3, 4, 5].

My question is: What's the fastest way (i.e. in microseconds) of producing the ordered output array as my_array grows arbitrarily large?

The original way I though of was concatenating the value to the original array and then sorting:

arr_out = np.sort(np.concatenate((my_array, np.array([my_val]))))
[ 1.   1.5  2.   3.   4.   5. ]

I know that np.concatenate is fast but I'm unsure how np.sort would scale as my_array grows, even given that my_array will always be sorted.

Edit:

I've compiled the times for the various methods listed at the time an answer was accepted:

Input:

import timeit

timeit_setup = 'import numpy as np\n' \
               'my_array = np.array([i for i in range(1000)], dtype=np.float64)\n' \
               'my_val = 1.5'
num_trials = 1000

my_time = timeit.timeit(
    'np.sort(np.concatenate((my_array, np.array([my_val]))))',
    setup=timeit_setup, number=num_trials
)

pauls_time = timeit.timeit(
    'idx = my_array.searchsorted(my_val)\n'
    'np.concatenate((my_array[:idx], [my_val], my_array[idx:]))',
    setup=timeit_setup, number=num_trials
)

sanchit_time = timeit.timeit(
    'np.insert(my_array, my_array.searchsorted(my_val), my_val)',
    setup=timeit_setup, number=num_trials
)

print('Times for 1000 repetitions for array of length 1000:')
print("My method took {}s".format(my_time))
print("Paul Panzer's method took {}s".format(pauls_time))
print("Sanchit Anand's method took {}s".format(sanchit_time))

Output:

Times for 1000 repetitions for array of length 1000:
My method took 0.017865657746239747s
Paul Panzer's method took 0.005813951002013821s
Sanchit Anand's method took 0.014003945532323987s

And the same for 100 repetitions for an array of length 1,000,000:

Times for 100 repetitions for array of length 1000000:
My method took 3.1770704101754195s
Paul Panzer's method took 0.3931240139911161s
Sanchit Anand's method took 0.40981490723551417s

You can simply run an experiment to see how each method scales as the list grows larger. Are you expecting someone else to do it for you? — Yilun Zhang
– Yilun Zhang, Commented Feb 13, 2018 at 22:44
@YilunZhang I'm more so wondering what other methods may exist that I haven't thought of. — Arda Arslan
– Arda Arslan, Commented Feb 13, 2018 at 22:46
@YilunZhang so how would you identify the correct index and perform the insert? That's clearly the approach the OP is looking for and they have shown effort on this problem. — roganjosh
– roganjosh, Commented Feb 13, 2018 at 22:49
Actually the speed of finding the index will not matter that much, since inserting in a numpy array takes linear time, any boost of searching the index, will be neglegible, we would only win up to 50%, but it will still be (very) slow for large arrays. — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Feb 13, 2018 at 22:52
insert creates a new array. either with concatenate, or creating a blank and copying values. You can't grow an ndarray in-place. Its size is fixed. — hpaulj
– hpaulj, Commented Feb 13, 2018 at 23:38

Paul Panzer · Accepted Answer · 2018-02-13 23:29:04Z

Use np.searchsorted to find the insertion point in logarithmic time:

>>> idx = my_array.searchsorted(my_val)
>>> np.concatenate((my_array[:idx], [my_val], my_array[idx:]))
array([1. , 1.5, 2. , 3. , 4. , 5. ])

Note 1: I recommend looking at @Willem Van Onselm's and @hpaulj's insightful comments.

Note 2: Using np.insert as suggested by @Sanchit Anand may be slightly more convenient if all datatypes are matching from the beginning. It is, however, worth mentioning that this convenience comes at the cost of significant overhead:

>>> def f_pp(my_array, my_val):
...      idx = my_array.searchsorted(my_val)
...      return np.concatenate((my_array[:idx], [my_val], my_array[idx:]))
... 
>>> def f_sa(my_array, my_val):
...      return np.insert(my_array, my_array.searchsorted(my_val), my_val)
...
>>> my_farray = my_array.astype(float)
>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=100000)
>>> repeat('f_sa(my_farray, my_val)', **kwds)
[1.2453778409981169, 1.2268288589984877, 1.2298014000116382]
>>> repeat('f_pp(my_array, my_val)', **kwds)
[0.2728819379990455, 0.2697303680033656, 0.2688361559994519]

Sanchit Anand · Accepted Answer · 2018-02-13 23:11:30Z

3

try

my_array = np.insert(my_array,my_array.searchsorted(my_val),my_val)

[EDIT] make sure that the array is of type float32 or float64, or add a decimal point to any of the list elements while initializing it.

edited Feb 13, 2018 at 23:11

answered Feb 13, 2018 at 22:49

Sanchit Anand

3053 silver badges11 bronze badges

4 Comments

Paul Panzer Over a year ago

Did you actually bother looking at the result of your suggestion?

Arda Arslan Over a year ago

Note that this makes everything a numpy.int32 by default

Sanchit Anand Over a year ago

What is wrong with it? It works fine on my PC. You have to declare your array as type float64 or float32 (or just add a decimal point somewhere while initializing).

Paul Panzer Over a year ago

@SanchitAnand You have to declare your array as type float64. Don't you think that's something worth mentioning in your answer?

Collectives™ on Stack Overflow

Numpy: Fastest way to insert value into array such that array's in order

2 Answers 2

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related