1

I have the following code:

import numpy as np



def fill(arr1, arr2, arr3, arr4, thresh= 0.5):
    out_arr = np.zeros(arr1.shape)
    for i in range(0,len(arr1)):
        arr1[i] = np.where(np.abs(arr1[i])<=thresh,np.nan,arr1[i])
        mask = np.isnan(arr1[i])
        arr1[i] = np.nan_to_num(arr1[i])
        merged1 = (arr2[i]*mask)+arr1[i]


        merged2 = np.where(np.abs(merged1)<=thresh,np.nan,merged1)
        mask = np.isnan(merged2)
        merged2 = np.nan_to_num(merged2)
        merged3 = (arr3[i]*mask)+merged2

        merged3 = np.where(np.abs(merged3)<=thresh,np.nan,merged3)
        mask = np.isnan(merged3)
        merged3 = np.nan_to_num(merged3)
        merged4 = (arr4[i]*mask)+merged3



        out_arr[i] = merged4


    return(out_arr)




arr1 = np.random.rand(10, 10, 10)
arr2 = np.random.rand(10, 10, 10)
arr3 = np.random.rand(10, 10, 10)
arr4 = np.random.rand(10, 10, 10)
arr = fill(arr1, arr2, arr3, arr4, 0.5)

I wonder if there is a more efficient way of doing this maybe with masked arrays? Basically what I am doing is to replace values below the threshold in each layer of the 3D array with the next array, and this over 4 arrays. How would this look like for n arrays? Thanks!

1
  • np.ma arrays won't help efficiency wise. Every operation has to update both the data and the mask. They can be convenient, but they aren't a speed-tool. Commented May 29, 2020 at 7:11

1 Answer 1

1

Your function can be simplified in several ways. In terms of efficiency, the most significant aspect is that you do not need to iterate over the first dimension, you can operate on the whole arrays directly. Besides that, you can refactor the replacement logic to something much simpler, and use a a loop to avoid repeating the same code over and over:

import numpy as np

# Function accepts as many arrays as wanted, with at least one
# (threshold needs to be passed as keyword parameter)
def fill(arr1, *arrs, thresh=0.5):
    # Output array
    out_arr = arr1.copy()
    for arr in arrs:
        # Replace values that are still below threshold
        mask = np.abs(out_arr) <= thresh
        out_arr[mask] = arr[mask]
    return out_arr

Since thresh needs to be passed as keyword parameter in this function, you would call it as:

arr = fill(arr1, arr2, arr3, arr4, thresh=0.5)
Sign up to request clarification or add additional context in comments.

4 Comments

Sorry but your answer lacks in depth. You didn't define the variable mask in the for loop. You would need to do that every time since the new values could include a <0,5. I am currently trying this but getting the error "umPy boolean array indexing assignment cannot assign 2124948 input values to the 4933532 output values where the mask is true".
I added an edit to your code because the masking did not work properly. Basically the function is now a merge out of mine and yours. The most problems are solved: No need to run through all the layers and n arrays can be passed to the function. I added the if/else to avoid returning nans in the out_arr. Another problem remains: There are still 5 numpy operations in the for loop which can be quite slow when a lot of arrays are passed into the functions. Thanks for your help!
@benschbob91 Thank you for the feedback, sorry, I made a last minute change to the code when I was writing the answer and messed it up. I think what I have written now fixes it, let me know if that is not the case.
Thank you so much! Your answer speeds up the calculation by several magnitudes and is correct. I accepted the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.