2

In python how would I do this:

say I have:

a = [[1, 5], [2,6], [3,3], [4,2]]
b= [[3, 1], [4,2], [1,8], [2,4]]

Now I want to do an operation with the second column values IF the first column values match.

E.G.

a has an entry [1,5], now go through b to see oh it has a value [1,8], now I want to divide 5/8 and store that value into say array c. Next would be matching [2,6] and [2,4] and getting the next value in c: 6/4.

so:

c = [5/8, 6/4, 3/1, 2/2] 

Given the above example. I hope this makes sense. Would like to this with numpy and python.

4
  • 2
    Is the first column of a always sorted? Do every first-column number in a appear in b? Are they of the same size? Commented May 16, 2016 at 18:14
  • @kennytm yes to all. Commented May 16, 2016 at 18:15
  • Are duplicates allowed in the first position within each list? Commented May 16, 2016 at 18:16
  • @YakymPirozhenko no each first position entry is unique Commented May 16, 2016 at 18:17

4 Answers 4

4

You can use np.searchsorted to get the positions where b's first column elements correspond to the a's first column elements and using that get the respective second column elements for division and finally get c. Thus, assuming a and b to be NumPy arrays, the vectorized implementation would be -

a0 = a[:,0]
c = np.true_divide(a[:,1],b[np.searchsorted(a0,b[:,0],sorter=a0.argsort()),1])

The approach listed above works for a generic case when the first column elements of a are not necessarily sorted. But, if they are sorted just like for the listed sample case, you can simply ignore the sorter input argument and have a simplified solution, like so -

c = np.true_divide(a[:,1],b[np.searchsorted(a0,b[:,0]),1])

Sample run -

In [35]: a
Out[35]: 
array([[1, 5],
       [2, 6],
       [3, 3],
       [4, 2]])

In [36]: b
Out[36]: 
array([[3, 1],
       [4, 2],
       [1, 8],
       [2, 4]])

In [37]: a0 = a[:,0]

In [38]: np.true_divide(a[:,1],b[np.searchsorted(a0,b[:,0],sorter=a0.argsort()),1])
Out[38]: array([ 0.625,  1.5  ,  3.   ,  1.   ])
Sign up to request clarification or add additional context in comments.

Comments

4

Given all of the assumptions in the comment section, this will work:

from operator import itemgetter
from __future__ import division

a = [[1, 5], [2,6], [3,3], [4,2]]
b = [[3, 1], [4,2], [1,8], [2,4]]

result = [x / y for (_, x), (_, y) in zip(a, sorted(b, key=itemgetter(0)))]

Assumptions: lists have equal lengths, elements in the first position are unique for each list, first list is sorted by first element, every element that occurs in the first position in a also occurs in the first position in b.

4 Comments

does this assume that every first column entry in a has a corresponding entry in b?
Possibly needing a from __future__ import division
@trans1st0r: Yes, because that is one of "the assumptions in the comment section"
@trans1st0r you are correct - I added explicit assumptions. Eric, good point, I will make an edit.
1

You can use a simple O(n^2) way with nested loops:

c = []

for x in a:
 for y in b:
   if x[0] == y[0]:
     c.append(x[1]/y[1])
     break

The above is useful when the lists are small. For huge lists, consider a dictionary based approach, where the complexity would be O(n) at the cost of some extra space.

Comments

0

I humbly propose that you're using the wrong data structure. Notice that if you have an array column that has unique values between 1 and N (an index column) you could encode the same data simply by re-ordering your other columns. Once you're re-ordered your data, not only can you drop the "index" column but now it becomes easier to operate on the remaining data. Let me demonstrate:

import numpy as np

N = 5
a = np.array([[1, 5], [2,6], [3,3], [4,2]])
b = np.array([[3, 1], [4,2], [1,8], [2,4]])

a_trans = np.ones(N)
a_trans[a[:, 0]] = a[:, 1]

b_trans = np.ones(N)
b_trans[b[:, 0]] = b[:, 1]

c = a_trans / b_trans
print c

Depending on the nature of your problem, you can sometimes use an implicit index from the beginning, but sometimes an explicit index can be very useful. If you need an explicit index, consider using something like pandas.DataFrame with better support for index operations.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.