Sorry for the cryptic description.....
I'm workng in Python and need a fast solution for the below problem
I have an array of float values in one array (this array length can include be millions of values
values = [0.1, 0.2, 5.7, 12.9, 3.5, 100.6]
Each value represents an estimate of a quantity at a particular location where the location is identified by an ID. Multiple estimates per location are possible/common
locations = [1, 5, 3, 1, 1, 3]
I need to average all of the values that that share the same location id.
I can use numpy.where to do this for one location value
average_value_at_location = np.average(values[np.where(locations == 1)])
And of course I could loop over all of the unique values in locations..... But I'm looking for a fast (vectorized) way of doing this and can't figure out how to compose the numpy functions to do this without looping in Python.....
I'm not tied to numpy for this solution.
Any help will be gratefully received.
Thanks,
Doug
np.uniqueisn't going to be that much slower, especially if you have only a few locations. Most of your work is doingnp.meanover millions of values, after all. If you really want to squeeze out that little bit of performance out of your loop, you could try using Numba to compile the loop. You could also use Numba to avoid making copies of arrays, but that would require you implement your ownmeanalgorithm, which is actually kinda complicated (there are many algorithms with different tradeoffs, and you'd probably want to matchnp.mean).