I'd like to left outer join two recarrays. The first is a list of entities with a unique key. The second is a list of values, and there can be 0 or more values per entity. My environment requires that I use Python 2.7 and I'm not able to use Pandas.
This question has been asked before here but there was not a good answer.
import numpy as np
import numpy.lib.recfunctions
from pprint import pprint
dtypes = [('point_index',int),('name','S50')]
recs = [(0,'Bob'),
(1,'Bob'),
(2,'Sue'),
(3,'Sue'),
(4,'Jim')]
x = np.rec.fromrecords(recs,dtype=dtypes)
dtypes = [('point_index',int),('type','S500'),('value',float)]
recs = [(0,'a',0.1),
(0,'b',0.2),
(1,'a',0.3),
(2,'b',0.4),
(2,'b',0.5),
(4,'a',0.6),
(4,'a',0.7),
(4,'a',0.8)]
y = np.rec.fromrecords(recs,dtype=dtypes)
j = np.lib.recfunctions.join_by('point_index',x,y,jointype='leftouter',usemask=False,asrecarray=True)
pprint(j.tolist())
I want
# [(0,'Bob','a',0.1),
# (0,'Bob','b',0.2),
# (1,'Bob','a',0.3),
# (2,'Sue','b',0.4),
# (2,'Sue','b',0.5),
# (4,'Jim','a',0.6),
# (4,'Jim','a',0.7),
# (4,'Jim','a',0.8)]
But I get
[(0, 'Bob', 'a', 0.1),
(0, 'Bob', 'b', 0.2),
(1, 'Sue', 'a', 0.3),
(2, 'Jim', 'b', 0.4),
(2, 'N/A', 'b', 0.5),
(3, 'Sue', 'N/A', 1e+20),
(4, 'N/A', 'a', 0.6),
(4, 'N/A', 'a', 0.7),
(4, 'N/A', 'a', 0.8)]
I know why, this is from the docs
Neither
r1norr2should have any duplicates alongkey: the presence of duplicates will make the output quite unreliable. Note that duplicates are not looked for by the algorithm.
So, it seems like this requirement really limits the usefulness of this function. It seems like the type of left outer join I describe is a really common operation, does anybody know how to achieve it using numpy?
join_byis a long Python function, with lots of concatenates, sorts, etc. Since it isn't compiled, it isn't going to be any faster than code that you could write yourself. My first thought is to collect information by key in a defaultdict, and build a structured array from that.pont_indexvalues inxalways sorted and contiguous, ie. always [0,1,2...]?