First, looping a 3D matrix in python is a very very very bad idea. In order to loop a large 3D matrix you are better of going down to Cython or C/C++/Fortran and creating a python extension. However, for this particular case, scipy already contains an implementation of the median filter for n-dimensional arrays:
>>> from scipy.ndimage import median_filter
>>> median_filter(my_large_3d_array, radious)
In short, there is no a faster way of iterating through voxels in python (maybe numpy iterators would help a bit, but won't increase the performance considerably). If you need to perform more complicated 3D stuff in python, you should consider programming in Cython the loopy interface or, alternatively, using a chunking library such as Dask, which implements parallel operations for chunks of arrays.
The problem with Python if that for loops are extremely slow, specially if they are nested and with large arrays. Thus, there is no a standard pythonic method for obtaining efficient iterations over arrays. Usually, the way of getting speed-ups is through vectorized operations and numpy-ticks, but those are very problem-specific and there is no generic trick, you will learn a lot of numpy tricks here in SO.
As a generic approach, if you really need to iterate over arrays, you can write your code in Cython. Cython is a C-like extension for Python. You write code in Python syntax, but specifying variable types (like in C, with int or float. That code is then compiled automatically to C and can be called from python. A quick example:
Example Python loopy function:
import numpy as np
def iter_A(A):
B = np.empty(A.shape, dtype=np.float64)
for i in range(A.shape[0]):
for j in range(A.shape[1]):
B[i, j] = A[i, j] * 2
return B
I know that the above code is kinda redundant and could be written as B = A * 2, but its purpose is just to illustrate that python loops are extremely slow.
Cython version of the function:
import numpy as np
cimport numpy as np
def iter_A_cy(double[:, ::1] A):
cdef Py_ssize_t H = A.shape[0], W = A.shape[1]
cdef double[:, ::1] B = np.empty((H, W), dtype=np.float64)
cdef Py_ssize_t i, j
for i in range(H):
for j in range(W):
B[i, j] = A[i, j] * 2
return np.asarray(B)
Test speeds of both implementations:
>>> import numpy as np
>>> A = np.random.randn(1000, 1000)
>>> %timeit iter_A(A)
1 loop, best of 3: 399 ms per loop
>>> %timeit iter_A_cy(A)
100 loops, best of 3: 2.11 ms per loop
NOTE: you cannot run the Cython function as it is. You need to put it in a separate file and compile it first (or use %%cython magic in IPython Notebook).
It shows that the raw python version took 400ms to iterate the whole array, while it was only 2ms for the Cython version (x200 speedup).