0

I have a square matrix that is NxN (N is usually >500). It is constructed using a numpy array.

I need to extract a new matrix that has the i-th column and row removed from this matrix. The new matrix is (N-1)x(N-1).

I am currently using the following code to extract this matrix:

            new_mat = np.delete(old_mat,idx_2_remove,0)
            new_mat = np.delete(old_mat,idx_2_remove,1)

I have also tried to use:

row_indices = [i for i in range(0,idx_2_remove)]
row_indices += [i for i in range(idx_2_remove+1,N)]
col_indices = row_indices
rows = [i for i in row_indices for j in col_indices]
cols = [j for i in row_indices for j in col_indices]

old_mat[(rows, cols)].reshape(len(row_indices), len(col_indices))

But I found this is slower than using np.delete() in the former. The former is still quite slow for my application.

Is there a faster way to accomplish what I want?

Edit 1: It seems the following is even faster than the above two, but not by much:

new_mat = old_mat[row_indices,:][:,col_indices]
7
  • If it is only around 500x500, the time copying is usually negligible. For bigger matrices, I believe creating an empty array and copying 4 slices onto it should be fairly fast. Commented Dec 11, 2018 at 3:59
  • I would say 500x500 is on the small side. The largest would be in the 50,000x50,000. Most cases I'm working with are probably around 10,000x10,000. What do you think about the method in the edit? i.e., new_mat = old_mat[row_indices,:][:,col_indices] Commented Dec 11, 2018 at 4:00
  • 1
    Depending on vectorizing stuff, using slice instead of fancy index should be faster, but that needs to be benchmarked. Anyway, indexing twice is likely going to take twice much time, you might want to at least broadcast these two indexing together, like old_mat[row_indices[:,None],col_indices]. Commented Dec 11, 2018 at 4:04
  • I will try that. Thanks! I have never tried slice before, but i will take a look. Commented Dec 11, 2018 at 4:07
  • I am getting an error using old_mat[row_indices[:,None],col_indices]. It is saying new_mat = old_mat[row_idx[:,None],col_idx] TypeError: list indices must be integers or slices, not tuple Commented Dec 11, 2018 at 4:11

1 Answer 1

1

Here are 3 alternatives I quickly wrote:

Repeated delete:

def foo1(arr, i):
    return np.delete(np.delete(arr, i, axis=0), i, axis=1)

Maximal use of slicing (may need some edge checks):

def foo2(arr,i):
    N = arr.shape[0]
    res = np.empty((N-1,N-1), arr.dtype)
    res[:i, :i] = arr[:i, :i]
    res[:i, i:] = arr[:i, i+1:]
    res[i:, :i] = arr[i+1:, :i]
    res[i:, i:] = arr[i+1:, i+1:]
    return res

Advanced indexing:

def foo3(arr,i):
    N = arr.shape[0]
    idx = np.r_[:i,i+1:N]
    return arr[np.ix_(idx, idx)]

Test that they work:

In [874]: x = np.arange(100).reshape(10,10)
In [875]: np.allclose(foo1(x,5),foo2(x,5))
Out[875]: True
In [876]: np.allclose(foo1(x,5),foo3(x,5))
Out[876]: True

Compare timings:

In [881]: timeit foo1(arr,100).shape
4.98 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [882]: timeit foo2(arr,100).shape
526 µs ± 1.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [883]: timeit foo3(arr,100).shape
2.21 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So the slicing is fastest, even if the code is longer. It looks like np.delete works like foo3, but one dimension at a time.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.