0

I am completely new to Python and coding, and I am stuck in trying to replace randomly selected values from one array with values from a second array. My data are extracted from 2 Iris Cubes and consists of LAT and LON data.

After loading the two cubes, I can extract the data from the 2 observation datasets of Latitude and Longitude, say "obs_1" and "obs_2", with shape (475, 635):

obs_1
<iris 'Cube' of OBSERVATIONS / (g/m2) (latitude: 475; longitude: 635)>

and

obs_2
<iris 'Cube' of OBSERVATIONS / (g/m2) (latitude: 475; longitude: 635)>

both obs_1.data and obs_2.data can be threaded as numpy arrays:

type(obs_1.data)
Out[174]: numpy.ndarray

with

size(obs_1.data)
Out[173]: 301625

My obs_1 consist of observations at time=18:00 for a selected day, and obs_2 an average over time for the same day, from t=14:00 to t=17:00.

Now, what I am trying to do is to randomly replace 50% of values in obs_1, with 50% of randomly selected values from obs_2.

Data in the arrays look like this (this is a selection from the array):

array([[       nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan, 3.6444201 , 3.6288068 ,
    3.4562614 , 3.1650603 , 2.837024  , 2.5862055 , 2.5824826 ,
           nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan],
   [       nan,        nan,        nan,        nan,        nan,
           nan,        nan, 4.126052  , 4.154033  , 3.6938105 ,
    3.1892183 , 2.837798  , 2.695081  ,        nan, 2.4830801 ,
    2.619453  , 2.744787  ,        nan,        nan,        nan,
    4.037193  , 3.9007418 , 3.918395  , 4.1123595 ,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan],
   [       nan,        nan,        nan,        nan,        nan,
           nan, 4.479512  , 4.139696  , 3.7454944 ,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan, 1.7283309 , 2.0259488 , 2.6097915 , 2.8537903 ,
    3.3934724 ,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan],
   [       nan,        nan,        nan,        nan,        nan,
    4.476785  , 4.5633755 , 3.7924814 , 3.270711  ,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan, 1.7360739 , 2.171296  ,
    2.6570952 , 3.58288   , 4.6880975 ,        nan,        nan,
           nan,        nan,        nan,        nan],
   [       nan,        nan,        nan,        nan,        nan,
    4.411482  , 3.9552238 , 3.7757099 , 2.875049  , 2.1458075 ,
           nan,        nan,        nan,        nan,        nan,
           nan, 1.7425493 , 1.8161889 ,        nan,        nan,
           nan,        nan,        nan, 1.2822593 , 1.4383382 ,
    1.5031592 , 1.5003852 , 1.9955662 , 4.0983477 ,        nan,
           nan,        nan,        nan,        nan],
   [       nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan, 1.5202525 , 1.2684406 ,
    1.3887881 , 1.6239417 , 1.5679324 , 1.3143418 ,        nan,
           nan,        nan, 0.9014559 , 1.046359  , 1.1121098 ,
    1.2461395 , 1.3922306 , 1.5674534 , 1.7686707 , 4.694426  ,
    5.8581176 ,        nan,        nan,        nan],
   [       nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan, 1.4250685 , 1.342187  ,
    1.460965  , 1.5898347 , 1.4935569 ,        nan,        nan,
    0.76497865, 0.7578024 , 0.9086805 , 1.1051334 , 1.0408422 ,
    1.0398425 , 1.1574577 ,        nan,        nan, 1.6596926 ,
    4.667655  ,        nan,        nan,        nan],
   [       nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan, 1.4770626 , 1.3014681 ,
    1.2809513 ,        nan,        nan, 1.0585229 , 0.98995847,
    0.8447306 , 0.7979446 ,        nan,        nan,        nan,
           nan,        nan,        nan,        nan,        nan,
    2.920856  ,        nan,        nan,        nan],
   [       nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan, 1.2806126 ,        nan,
    0.97792864, 0.8848762 , 2.0891907 , 1.4531214 , 1.2615036 ,
    0.97086287,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan, 4.1831126 ,        nan,        nan],
   [       nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan, 1.1235833 , 1.2448411 , 0.95834756, 0.99093884,
    1.0072019 , 1.1916308 , 0.9324562 , 1.0275717 , 1.2712531 ,
           nan,        nan,        nan,        nan,        nan,
           nan,        nan,        nan,        nan,        nan,
           nan, 3.2303405 , 4.449829  ,        nan]], dtype=float32)

Where nan are values masked by the loading processes (data not relevant).

I did a search and tried with np.random and masking, however I can't understand how to randomly select from both arrays, and replace the obs_1 mask with obs_2 mask, given that the masks have a different shape. I am struggling with writing the code, so except for loading the data using iris cube (that i can post if of any help), i do not have an example to show.

Could someone please point me to any example (I couldn't find any so far regarding exchanging data from different arrays) or give me any hints of how to proceed.

Many thanks in advance. All the best

3
  • Adding a few sample data points will help to answer this question. Can you update your question with sample data points? Commented Aug 21, 2020 at 9:26
  • thanks for your replay. I did add an example of data extracted form the array at a selected interval of LAT and LON. I hope this helps. Commented Aug 21, 2020 at 9:47
  • I am trying something on the lines of <frac=0.5> to select my %. then how many samples with <replace_size = obs_1.data(ise) * frac>; the 2 masks for my 2 datasets: < mask = np.random.choice([0, 1], size=obs_1.shape, p=((1 - frac), frac)).astype(np.bool)> and <mask2 = np.random.choice([0, 1], size=obs_2.shape, p=((1 - frac), frac)).astype(np.bool)>. If I try to change then the obs_1[mask] with obs_2[mask_2] I get an error "too many indices for array". I am still trying to understand the functions so I do not have more. If i'll manage to do some progress soon, i'll post it. Commented Aug 21, 2020 at 12:48

1 Answer 1

0

See this question for randomly selection from a Numpy Array.

obs_1 = np.array(
    [[1, 3, 0],
     [3, 2, 0],
     [0, 2, 1],
     [1, 1, 4],
     [3, 2, 2],
     [0, 1, 0],
     [1, 3, 1],
     [0, 4, 1],
     [2, 4, 2],
     [3, 3, 1]]
)

obs_2 = np.array(
    [[10, 3, 0],
     [30, 2, 0],
     [100, 2, 1],
     [10, 1, 4],
     [30, 2, 2],
     [100, 1, 0],
     [10, 3, 1],
     [100, 4, 1],
     [20, 4, 2],
     [30, 3, 1]]
)

n_observation = min(obs_1.shape[0], obs_2.shape[0])
index_1 = np.random.choice(np.arange(obs_1.shape[0]), int(n_observation / 2), replace=False)
index_2 = np.random.choice(np.arange(obs_2.shape[0]), int(n_observation / 2), replace=False)
obs_1[index_1, :] = obs_2[index_2, :]
Sign up to request clarification or add additional context in comments.

3 Comments

thanks for this. I was trying to do something similar after looking at other posts, but as I said in a previous comment, I got errors. are you selecting the 50% by just dividing the obs_X[0] by two, so that the indexes can match?
As I understand your array has one row for each observation. I select n_observation / 2 number of row indices. This results in 50% of the rows in both arrays (if both arrays have the same number of observations).
Yes, it is one row for each observation, and each value has LAT and LON as coordinates and both arrays have equal length. Thanks for the clarification/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.