9

Suppose I have an array my_array and a singular value my_val. (Note that my_array is always sorted).

my_array = np.array([1, 2, 3, 4, 5])
my_val = 1.5

Because my_val is 1.5, I want to put it in between 1 and 2, giving me the array [1, 1.5, 2, 3, 4, 5].

My question is: What's the fastest way (i.e. in microseconds) of producing the ordered output array as my_array grows arbitrarily large?

The original way I though of was concatenating the value to the original array and then sorting:

arr_out = np.sort(np.concatenate((my_array, np.array([my_val]))))
[ 1.   1.5  2.   3.   4.   5. ]

I know that np.concatenate is fast but I'm unsure how np.sort would scale as my_array grows, even given that my_array will always be sorted.

Edit:

I've compiled the times for the various methods listed at the time an answer was accepted:

Input:

import timeit

timeit_setup = 'import numpy as np\n' \
               'my_array = np.array([i for i in range(1000)], dtype=np.float64)\n' \
               'my_val = 1.5'
num_trials = 1000

my_time = timeit.timeit(
    'np.sort(np.concatenate((my_array, np.array([my_val]))))',
    setup=timeit_setup, number=num_trials
)

pauls_time = timeit.timeit(
    'idx = my_array.searchsorted(my_val)\n'
    'np.concatenate((my_array[:idx], [my_val], my_array[idx:]))',
    setup=timeit_setup, number=num_trials
)

sanchit_time = timeit.timeit(
    'np.insert(my_array, my_array.searchsorted(my_val), my_val)',
    setup=timeit_setup, number=num_trials
)

print('Times for 1000 repetitions for array of length 1000:')
print("My method took {}s".format(my_time))
print("Paul Panzer's method took {}s".format(pauls_time))
print("Sanchit Anand's method took {}s".format(sanchit_time))

Output:

Times for 1000 repetitions for array of length 1000:
My method took 0.017865657746239747s
Paul Panzer's method took 0.005813951002013821s
Sanchit Anand's method took 0.014003945532323987s

And the same for 100 repetitions for an array of length 1,000,000:

Times for 100 repetitions for array of length 1000000:
My method took 3.1770704101754195s
Paul Panzer's method took 0.3931240139911161s
Sanchit Anand's method took 0.40981490723551417s
18
  • You can simply run an experiment to see how each method scales as the list grows larger. Are you expecting someone else to do it for you? Commented Feb 13, 2018 at 22:44
  • 2
    @YilunZhang I'm more so wondering what other methods may exist that I haven't thought of. Commented Feb 13, 2018 at 22:46
  • 1
    @YilunZhang so how would you identify the correct index and perform the insert? That's clearly the approach the OP is looking for and they have shown effort on this problem. Commented Feb 13, 2018 at 22:49
  • 1
    Actually the speed of finding the index will not matter that much, since inserting in a numpy array takes linear time, any boost of searching the index, will be neglegible, we would only win up to 50%, but it will still be (very) slow for large arrays. Commented Feb 13, 2018 at 22:52
  • 1
    insert creates a new array. either with concatenate, or creating a blank and copying values. You can't grow an ndarray in-place. Its size is fixed. Commented Feb 13, 2018 at 23:38

2 Answers 2

5

Use np.searchsorted to find the insertion point in logarithmic time:

>>> idx = my_array.searchsorted(my_val)
>>> np.concatenate((my_array[:idx], [my_val], my_array[idx:]))
array([1. , 1.5, 2. , 3. , 4. , 5. ])

Note 1: I recommend looking at @Willem Van Onselm's and @hpaulj's insightful comments.

Note 2: Using np.insert as suggested by @Sanchit Anand may be slightly more convenient if all datatypes are matching from the beginning. It is, however, worth mentioning that this convenience comes at the cost of significant overhead:

>>> def f_pp(my_array, my_val):
...      idx = my_array.searchsorted(my_val)
...      return np.concatenate((my_array[:idx], [my_val], my_array[idx:]))
... 
>>> def f_sa(my_array, my_val):
...      return np.insert(my_array, my_array.searchsorted(my_val), my_val)
...
>>> my_farray = my_array.astype(float)
>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=100000)
>>> repeat('f_sa(my_farray, my_val)', **kwds)
[1.2453778409981169, 1.2268288589984877, 1.2298014000116382]
>>> repeat('f_pp(my_array, my_val)', **kwds)
[0.2728819379990455, 0.2697303680033656, 0.2688361559994519]
Sign up to request clarification or add additional context in comments.

Comments

3

try

my_array = np.insert(my_array,my_array.searchsorted(my_val),my_val)

[EDIT] make sure that the array is of type float32 or float64, or add a decimal point to any of the list elements while initializing it.

4 Comments

Did you actually bother looking at the result of your suggestion?
Note that this makes everything a numpy.int32 by default
What is wrong with it? It works fine on my PC. You have to declare your array as type float64 or float32 (or just add a decimal point somewhere while initializing).
@SanchitAnand You have to declare your array as type float64. Don't you think that's something worth mentioning in your answer?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.