2

I have an NxM array, as well as an arbitrary list of sets of column indices I'd like to use to slice the array. For example, the 3x3 array

my_arr = np.array([[1, 2, 3], [1, 2, 3], [1, 2, 3]])

and index sets

my_idxs = [[0, 1], [2]]

I would like to use the pairs of indices to select the corresponding columns from the array and obtain the length of the (row-wise) vectors using np.linalg.norm(). I would like to do this for all index pairs. Given the aforementioned array and list of index sets, this should give:

[[2.23606797749979, 3],
 [2.23606797749979, 3],
 [2.23606797749979, 3]]

When all sets have the same number of indices (for example, using my_idxs = [[0, 1], [1, 2]] I can simply use np.linalg.norm(my_arr[:, my_idxs], axis=1):

[[2.23606797749979, 3.605551275463989],
 [2.23606797749979, 3.605551275463989],
 [2.23606797749979, 3.605551275463989]]

However, when they are not (as is the case with my_idxs = [[0, 1], [2]], the varying index list lengths yield an error when slicing as the array of index sets would be irregular in shape. Is there any way to implement the single-line option, without resorting to looping over the list of index sets and handling each of them separately?

5
  • To be sure that what you're looking for is understood, I'd suggest adding the expected result of the more general case and and a loop to calculate it. Commented Apr 17, 2024 at 14:49
  • In the realistic case what are the dimensions? How many sets? Remember, with numpy, a few iterations on a complex task might actually be fastest. Loops aren't absolutely bad. Commented Apr 17, 2024 at 17:52
  • There are use-cases of my_idxs where this can be done with ufunc.reduceat() but it requres all sets to be contiguous and monotonic (i.e., [[1,3], [2]] isn't a possibility). Is my_idxs truly arbitrary or does it follow these requirements? Commented Apr 18, 2024 at 7:22
  • @hpaulj In the realistic case, the arrays would not even be that big; think ~100 x 8. There would only be a few sets since you can only make so many combinations within the 8 columns available. The few iterations might indeed not be bad at all, and might not even be avoidable. However, they will be running inside a simulation environment that runs ~230 times a second, so I figured I would try and see if there were any efficient ways of going about this. You are absolutely right though, loops are not always inherently bad and might actually be the best solution in this case. Commented Apr 18, 2024 at 10:34
  • @danielF The sets are specified in settings for the environment the simulation is run in and they would also be disjoint. I think the word arbitrary indeed might not have been the most fitting here, as they are specified beforehand in the settings. That is an unfortunate choice of words from my side. However, I do not think I can guarantee in all cases that the sets would follow those requirements. I suppose I'd also like to keep it as generalisable as possible for the future. Commented Apr 18, 2024 at 10:41

3 Answers 3

3

You can try:

my_arr = np.array([[1, 2, 3], [1, 2, 3], [1, 2, 3]])
my_idxs = [[0, 1], [2]]

out = np.c_[*[np.linalg.norm(my_arr[:, i], axis=1) for i in my_idxs]]
print(out)

Prints:

[[2.23606798 3.        ]
 [2.23606798 3.        ]
 [2.23606798 3.        ]]
Sign up to request clarification or add additional context in comments.

2 Comments

A one line, but still a loop. But such a loop may be necessary.
I agree. This solution looks to be the most elegant, though! Does the job, thanks! Since I am running on 3.10.7, the unpack operator * cannot yet be used in subscript (3.11+). Instead, I run it without and transpose the result, which gives the same output.
1

Here is an answer without a loop – kind of. The basic idea is to replace your lists of indices, my_idxs, with equivalent "boolean masks", my_masks, where contained indices are marked with 1 and others with 0. You can then calculate the norm after weighting with the masks. A solution could thus look as follows:

import numpy as np

my_arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Replace index lists with boolean masks: [0, 1] → [1, 1, 0], [2] → [0, 0, 1]
my_masks = [[1, 1, 0], [0, 0, 1]]

result = np.linalg.norm(my_arr[:, np.newaxis, :] * my_masks, axis=-1)
print(result)
# >>> [[ 2.23606798  3.        ]
#      [ 6.40312424  6.        ]
#      [10.63014581  9.        ]]

Note that I replaced your values in my_arr with different values for each row, to confirm that the approach actually works as expected. Furthermore, I am quite sure an equivalent solution could be implemented using masked arrays.

In any case, now here is the catch: I did not find an approach that is not using a for loop to convert your lists of indices into my masks. So, in a way, I am just moving the problem. However, depending on how you determine your lists of indices in the first place, using masks in their place might still be a solution to consider.

1 Comment

In particular, what did not work for mask creation: At first, the answers to this question (stackoverflow.com/questions/53631460) looked promising; however, the problem in our case is (again, just as noted in the question) that the lists of indices are not all the same length.
0

You're looking to compute the row-wise norms of vectors formed by selecting columns from a NumPy array using varying lengths of index lists. You aim to achieve this efficiently, preferably without using explicit loops over the index sets.

Solution: You can use a list comprehension to address the challenge posed by index sets of varying lengths. While this isn't a single slicing operation (which is impossible due to irregular shapes), it's a concise approach that utilizes the power of NumPy's operations.

Here's how you can implement this:

import numpy as np

# Define your array
my_arr = np.array([[1, 2, 3], [1, 2, 3], [1, 2, 3]])

# Define your list of index sets
my_idxs = [[0, 1], [2]]

# Compute the norm for each set of indices using a list comprehension
result = np.array([np.linalg.norm(my_arr[:, idx], axis=1) for idx in my_idxs]).T

# Print the result
print(result)

Output:

 [[2.23606798 3.        ]
 [2.23606798 3.        ]
 [2.23606798 3.        ]]

Explanation:

List Comprehension: Loops through each set of indices in my_idxs. For each set, it selects the corresponding columns from my_arr and calculates the norm across the rows (axis=1).

Transpose (T): The result of the list comprehension is a list where each element is an array representing the norms calculated for each set of indices. np.array(...) converts this list into a 2D NumPy array. The transpose is then applied to align the output correctly so that each row corresponds to the original rows of my_arr, and each column represents the norm results for each index set.

This approach efficiently handles your requirement to compute vector norms from subsets of array columns, even when the subsets vary in length. It maximizes the use of NumPy's capabilities to avoid explicit performance penalties associated with looping through array rows.

Conclusion: Although directly slicing with varying index lengths into a single operation isn't feasible due to the non-uniform dimensions that would result, the solution provided achieves your goal using a compact and efficient line of Python code.

2 Comments

Your andwer is basically the same ss the previous andwer, just using a different way of concatensting the norms. And a longer explanation. No one has seriously suggested iterating on rows (I don't see how that would help). As long as the number of index sets is relatively snall, iterating on the sets won't be expensive.
@hpaulj Looks like an AI-generated answer to me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.