What is a good way of mapping arrays in Python?

Question

I have an old legacy Fortran code that is going to be called from Python.

In this code, data arrays are computed by some algorithm. I have simplified it: let's say we have 10 elements to proceed (in the real application its more often 10e+6 than 10):

number_of_elements = 10
element_id_1 = [0, 1, 2, 1, 1, 2, 3, 0, 3, 0] # size = number_of_elements
element_id_2 = [0, 1, 2]                      # size = max(element_id_1)

These arrays are then used as follows:

my_element = one_of_my_10_elements # does not matter where it comes from
my_element_position = elt_position_in_element_id_1 # does not matter how
id_1 = element_id_1[my_element_position]
if id_1 == 0:
   id_2 = None
else:
   id_2 = element_id_2[id_1-1]
   modify(my_element, some_other_data[id_2])

What would be a Pythonic/numpy way of managing this kind of relations, i.e. for getting id_2 for a given element?

I have had a look on masked arrays but I haven't figured out a way to use them for this configuration. Implementing a class for elements, which will store id_2 once it is computed and just providing it later makes me think of a very poor calculation time compared to arrays manipulating. Am I wrong?

UPD. A larger example of what is currently done in the legacy code:

import numpy as np
number_of_elements = 10

elements = np.arange(number_of_elements, dtype=int)  # my elements IDs
# elements data
# where element_x[7] provides X value for element 7
#   and element_n[7] provides N value for element 7
element_x = np.arange(number_of_elements, dtype=np.float)
element_n = np.arange(number_of_elements, dtype=np.int32)

# array defining subsets of elements
# where
# element_id_1[1] = element_id_1[3] = element_id_1[4] means elements 1, 3 and 4 have something in common
# and
# element_id_1[9] = 0 means element 9 does not belong to any group
element_id_1 = np.array([0, 1, 2, 1, 1, 2, 3, 0, 3, 0])  # size = number_of_elements

# array defining other data for each group of elements
# element_id_2[0] means elements of group 1 (elements 1, 3 and 4) have no data associated
# element_id_2[1] = 1 means elements of group 2 (elements 2 and 5) have data associated: other_x[element_id_2[1]-1] = 7.
# element_id_2[2] = 2 means elements of group 3 (elements 6 and 8) have data associated: other_x[element_id_2[1]-1] = 5.
element_id_2 = np.array([0, 1, 2])  # size = max(element_id_1)
other_x = np.array([7., 5.]) # size = max(element_id_2)

# work with elements
for my_element_position in elements:
    id_1 = element_id_1[my_element_position]

    if id_1 == 0:
        print 'element %d, skipping'%(my_element_position)
        continue

    id_2 = element_id_2[id_1-1]

    if id_2 > 0:
        # use element_x[my_element_position], element_n[my_element_position] and other_x[id_2] to compute more data
        print 'element %d, using other_x[%d] = %f'%(my_element_position, id_2, other_x[id_2-1])
    else:
        # use element_x[my_element_position] and element_n[my_element_position] to compute more data
        print 'element %d, not using other_x'%(my_element_position)

I have got to the following with slicing knowing that slicing a numpy array is supposed to be faster than iterating over it:

elements_to_skip = np.where(element_id_1[:] == 0)[0]
for my_element_position in elements_to_skip:
    print 'element %d, skipping'%(my_element_position)

elements_with_id1 = np.where(element_id_1[:] > 0)[0]
array1 = element_id_1[elements_with_id1]
array1 = element_id_2[array1-1]
array1 = np.where(array1[:] > 0)[0]
elements_with_data = elements_with_id1[array1]
id_2_array = element_id_2[element_id_1[elements_with_data]-1]
for my_element_position, id_2 in zip(elements_with_data, id_2_array):
    print 'element %d, using other_x[%d] = %f'%(my_element_position, id_2, other_x[id_2-1])

elements_without_data = np.delete(elements, np.concatenate((elements_to_skip, elements_with_data)))
for my_element_position in elements_without_data:
    print 'element %d, not using other_x'%(my_element_position)

This gives the same result as the code snippet just above. Do you see a way to make this unreadable code better? Would this approach be more recommended than the previous code snippet?

To summarize: you wish to remap an array, replacing a given subset of values in that array with a given set of replacement values; is that correct? — Eelco Hoogendoorn
– Eelco Hoogendoorn, Commented Aug 9, 2016 at 10:34

Ben K. · Accepted Answer · 2015-08-26 13:29:44Z

2

I am not entirely sure what your code needs to do, but I think that if you are working with numpy arrays you want to do something like this:

number_of_elements = 10
element_id_1 = np.array([0, 1, 2, 1, 1, 2, 3, 0, 3, 0]) 
element_id_2 = np.array([np.NaN,0, 1, 2]  )
# to get the  "elemtn_id_1 th" elemnt from element_id_2
result=element_id_2[element_id_1]

I use np.NaN instead of None If you do not want np.NaN in the result, just do:

result[np.logical_not(np.isnan(result))]

EDIT: Based on your example code it is nothing more than I did, you just have to assign different other_x for the case 0 and 1 and then extracting whatever you need from the array:

element_id_1 = np.array([0, 1, 2, 1, 1, 2, 3, 0, 3, 0])

data=array([np.NaN,-1,7,5])

result=data[element_id_1]

print "skipping:"+str(np.where(np.isnan(result)))

print "using other data:"+str(np.where(nan_to_num(result)>0))
print "other data used:"+str(result[nan_to_num(result)>0])

print "not using other data:"+str(np.where(result==-1))

which returns:

skipping:(array([0, 7, 9]),)
using other data:(array([2, 5, 6, 8]),)
other data used:[ 7.  7.  5.  5.]
not using other data:(array([1, 3, 4]),)

If you do not like nans, you can also omit them by assigning -2 in that case.

edited Aug 26, 2015 at 13:29

answered Aug 25, 2015 at 21:12

Ben K.

1,1506 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Eka AW Over a year ago

Thank you @user1901493. At least for the hint with np.NaN which I didn't think of. It'll be useful. It's a bit more than getting data from the second array based on values in the first one. There's already an "implicit" array of elements which is in fact element_id_1's indices. I have added an updated example to the question.

Kamyar Ghasemlou · Accepted Answer · 2016-08-09 08:48:14Z

1

If I had a similar problem, I would go with hashMaps. dict in python is almost the same as what hashMap is in most languages.
for detailed information check: Python dictionary implementation So something like:

id2_dict = {}
my_element = one_of_my_10_elements # does not matter where it comes from
my_element_position = elt_position_in_element_id_1 # does not matter how
id_1 = element_id_1[my_element_position]
if id_1 == 0:
   id2_dict[id_1] = None
else:
   id2_dict[id_1] = id2_dict[id_1-1]

Considering nature of your data(integers), you might want to use a list, but if your id_1 values are sparse, you are going to waste a lot of space and would have a less pythonic approach. But if your id_1 values span an integer range, and are dense around certain ranges, then go with a list and handle indexes accordingly. list will save you the hashing part, but would make it less pythonic and harder to maintain.
tl;dr: if id_1s are dense and almost span a range, go with a list and id_1 as index(with some index shifting), else go with a (hashmap)dict and id_1 as key.

edited Aug 9, 2016 at 8:48

answered Aug 25, 2015 at 17:17

Kamyar Ghasemlou

8592 gold badges9 silver badges24 bronze badges

4 Comments

Eka AW Over a year ago

Thank you @Kamyar Ghasemlou, I'll have a look on that. Array contents are always non-negative integers from 0 to N, with each value between 0 and N present in the array at least twice. In most cases, there will be a lot of zeros in element_id_1.

Kamyar Ghasemlou Over a year ago

so you aree going to have a lot of duplicate key/indexes, how do you plan to handle them? override the previous value or keep the max/min value? If you are going to have a lot of zeros in id_2 dict, simply don't add anything, if the key does not exists, the value would be zero. It will save you a lot of space but would decrease the performance.

Eka AW Over a year ago

Yes, you are right. I'll go for prototyping both approaches in order to analyze code performance and to see which approach I should priviledge. Thanks a lot for this food for thought!

Kamyar Ghasemlou Over a year ago

Your welcome, I hope you would find an elegant solution. I would appreciate If you could select the answer as accepted, of course if it was of use. :)

Collectives™ on Stack Overflow

What is a good way of mapping arrays in Python?

2 Answers 2

1 Comment

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related