5

I have an old legacy Fortran code that is going to be called from Python.

In this code, data arrays are computed by some algorithm. I have simplified it: let's say we have 10 elements to proceed (in the real application its more often 10e+6 than 10):

number_of_elements = 10
element_id_1 = [0, 1, 2, 1, 1, 2, 3, 0, 3, 0] # size = number_of_elements
element_id_2 = [0, 1, 2]                      # size = max(element_id_1)

These arrays are then used as follows:

my_element = one_of_my_10_elements # does not matter where it comes from
my_element_position = elt_position_in_element_id_1 # does not matter how
id_1 = element_id_1[my_element_position]
if id_1 == 0:
   id_2 = None
else:
   id_2 = element_id_2[id_1-1]
   modify(my_element, some_other_data[id_2])

What would be a Pythonic/numpy way of managing this kind of relations, i.e. for getting id_2 for a given element?

I have had a look on masked arrays but I haven't figured out a way to use them for this configuration. Implementing a class for elements, which will store id_2 once it is computed and just providing it later makes me think of a very poor calculation time compared to arrays manipulating. Am I wrong?

UPD. A larger example of what is currently done in the legacy code:

import numpy as np
number_of_elements = 10

elements = np.arange(number_of_elements, dtype=int)  # my elements IDs
# elements data
# where element_x[7] provides X value for element 7
#   and element_n[7] provides N value for element 7
element_x = np.arange(number_of_elements, dtype=np.float)
element_n = np.arange(number_of_elements, dtype=np.int32)

# array defining subsets of elements
# where
# element_id_1[1] = element_id_1[3] = element_id_1[4] means elements 1, 3 and 4 have something in common
# and
# element_id_1[9] = 0 means element 9 does not belong to any group
element_id_1 = np.array([0, 1, 2, 1, 1, 2, 3, 0, 3, 0])  # size = number_of_elements

# array defining other data for each group of elements
# element_id_2[0] means elements of group 1 (elements 1, 3 and 4) have no data associated
# element_id_2[1] = 1 means elements of group 2 (elements 2 and 5) have data associated: other_x[element_id_2[1]-1] = 7.
# element_id_2[2] = 2 means elements of group 3 (elements 6 and 8) have data associated: other_x[element_id_2[1]-1] = 5.
element_id_2 = np.array([0, 1, 2])  # size = max(element_id_1)
other_x = np.array([7., 5.]) # size = max(element_id_2)

# work with elements
for my_element_position in elements:
    id_1 = element_id_1[my_element_position]

    if id_1 == 0:
        print 'element %d, skipping'%(my_element_position)
        continue

    id_2 = element_id_2[id_1-1]

    if id_2 > 0:
        # use element_x[my_element_position], element_n[my_element_position] and other_x[id_2] to compute more data
        print 'element %d, using other_x[%d] = %f'%(my_element_position, id_2, other_x[id_2-1])
    else:
        # use element_x[my_element_position] and element_n[my_element_position] to compute more data
        print 'element %d, not using other_x'%(my_element_position)

I have got to the following with slicing knowing that slicing a numpy array is supposed to be faster than iterating over it:

elements_to_skip = np.where(element_id_1[:] == 0)[0]
for my_element_position in elements_to_skip:
    print 'element %d, skipping'%(my_element_position)

elements_with_id1 = np.where(element_id_1[:] > 0)[0]
array1 = element_id_1[elements_with_id1]
array1 = element_id_2[array1-1]
array1 = np.where(array1[:] > 0)[0]
elements_with_data = elements_with_id1[array1]
id_2_array = element_id_2[element_id_1[elements_with_data]-1]
for my_element_position, id_2 in zip(elements_with_data, id_2_array):
    print 'element %d, using other_x[%d] = %f'%(my_element_position, id_2, other_x[id_2-1])

elements_without_data = np.delete(elements, np.concatenate((elements_to_skip, elements_with_data)))
for my_element_position in elements_without_data:
    print 'element %d, not using other_x'%(my_element_position)

This gives the same result as the code snippet just above. Do you see a way to make this unreadable code better? Would this approach be more recommended than the previous code snippet?

1
  • To summarize: you wish to remap an array, replacing a given subset of values in that array with a given set of replacement values; is that correct? Commented Aug 9, 2016 at 10:34

2 Answers 2

2

I am not entirely sure what your code needs to do, but I think that if you are working with numpy arrays you want to do something like this:

number_of_elements = 10
element_id_1 = np.array([0, 1, 2, 1, 1, 2, 3, 0, 3, 0]) 
element_id_2 = np.array([np.NaN,0, 1, 2]  )
# to get the  "elemtn_id_1 th" elemnt from element_id_2
result=element_id_2[element_id_1]

I use np.NaN instead of None If you do not want np.NaN in the result, just do:

result[np.logical_not(np.isnan(result))]

EDIT: Based on your example code it is nothing more than I did, you just have to assign different other_x for the case 0 and 1 and then extracting whatever you need from the array:

element_id_1 = np.array([0, 1, 2, 1, 1, 2, 3, 0, 3, 0])

data=array([np.NaN,-1,7,5])

result=data[element_id_1]

print "skipping:"+str(np.where(np.isnan(result)))

print "using other data:"+str(np.where(nan_to_num(result)>0))
print "other data used:"+str(result[nan_to_num(result)>0])

print "not using other data:"+str(np.where(result==-1))

which returns:

skipping:(array([0, 7, 9]),)
using other data:(array([2, 5, 6, 8]),)
other data used:[ 7.  7.  5.  5.]
not using other data:(array([1, 3, 4]),)

If you do not like nans, you can also omit them by assigning -2 in that case.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you @user1901493. At least for the hint with np.NaN which I didn't think of. It'll be useful. It's a bit more than getting data from the second array based on values in the first one. There's already an "implicit" array of elements which is in fact element_id_1's indices. I have added an updated example to the question.
1

If I had a similar problem, I would go with hashMaps. dict in python is almost the same as what hashMap is in most languages.
for detailed information check: Python dictionary implementation So something like:

id2_dict = {}
my_element = one_of_my_10_elements # does not matter where it comes from
my_element_position = elt_position_in_element_id_1 # does not matter how
id_1 = element_id_1[my_element_position]
if id_1 == 0:
   id2_dict[id_1] = None
else:
   id2_dict[id_1] = id2_dict[id_1-1]

Considering nature of your data(integers), you might want to use a list, but if your id_1 values are sparse, you are going to waste a lot of space and would have a less pythonic approach. But if your id_1 values span an integer range, and are dense around certain ranges, then go with a list and handle indexes accordingly. list will save you the hashing part, but would make it less pythonic and harder to maintain.
tl;dr: if id_1s are dense and almost span a range, go with a list and id_1 as index(with some index shifting), else go with a (hashmap)dict and id_1 as key.

4 Comments

Thank you @Kamyar Ghasemlou, I'll have a look on that. Array contents are always non-negative integers from 0 to N, with each value between 0 and N present in the array at least twice. In most cases, there will be a lot of zeros in element_id_1.
so you aree going to have a lot of duplicate key/indexes, how do you plan to handle them? override the previous value or keep the max/min value? If you are going to have a lot of zeros in id_2 dict, simply don't add anything, if the key does not exists, the value would be zero. It will save you a lot of space but would decrease the performance.
Yes, you are right. I'll go for prototyping both approaches in order to analyze code performance and to see which approach I should priviledge. Thanks a lot for this food for thought!
Your welcome, I hope you would find an elegant solution. I would appreciate If you could select the answer as accepted, of course if it was of use. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.