Combine two array's data using inner join

Question

I've two data sets in array:

arr1 = [
  ['2011-10-10', 1, 1],
  ['2007-08-09', 5, 3],
  ...
]

arr2 = [
  ['2011-10-10', 3, 4],
  ['2007-09-05', 1, 1],
  ...
]

I want to combine them into one array like this:

arr3 = [
  ['2011-10-10', 1, 1, 3, 4],
  ...
]

I mean, just combine those lines with the same date column.

Just for clarification, I don't need those lines which not appear in both array, just drop them.

Must ['2007-09-05', 1, 1] still be in the output (in arr3)? — Dr. Jan-Philip Gehrcke
– Dr. Jan-Philip Gehrcke, Commented Jul 16, 2013 at 17:22
pandas is an excellent choice for operations along these lines, if you don't mind a relatively "heavy" dependency: pandas.pydata.org — Joe Kington
– Joe Kington, Commented Jul 16, 2013 at 17:26

cwallenpoole · Accepted Answer · 2013-07-16 17:21:35Z

6

Organize your data differently (you can easily convert what you already have to two dicts):

d1 = { '2011-10-10': [1, 1],
       '2007-08-09': [5, 3]
     }
d2 = { '2011-10-10': [3, 4],
       '2007-09-05': [1, 1]
     }

Then:

d3 = { k : d1[k] + d2[k] for k in d1 if k in d2 }

edited Jul 16, 2013 at 17:21

cwallenpoole

82.4k26 gold badges132 silver badges174 bronze badges

answered Jul 16, 2013 at 17:16

jason

243k35 gold badges436 silver badges532 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Dr. Jan-Philip Gehrcke Over a year ago

This will miss those entries with dates that do not occur in both sets.

jason Over a year ago

@Jan-Philip Gehrcke: "I mean, just combine those lines with the same date column."

Dr. Jan-Philip Gehrcke Over a year ago

Jason, yes, he says that he wants those being combined, but he does not say that he wants to miss the other data points. He (or she) needs to clarify.

jason Over a year ago

Well, either way, it's an easy modification for a full-outer join: { k : (d1[k] if k in d1 else []) + (d2[k] if k in d2 else []) for k in set(d1).union(d2) }.

user3212761 · Accepted Answer · 2017-02-01 10:06:04Z

It may be worth mentioning set data types. as their methods align to the type of problem. The set operators allow you to join sets easily and flexibly with full, inner, outer, left, right joins. As with dictionaries, sets do not retain order, but if you cast a set back into a list, you may then apply an order on the result join. Alternatively, you could use an ordered dictionary.

set1 = set(x[0] for x in arr1)
set2 = set(x[0] for x in arr2)
resultset = (set1 & set2)

This only gets you the union of dates in the original lists, in order to reconstruct arr3 you would need to append the [1:] data in arr1 and arr2 where the dates are in the result set. This reconstruction would not be as neat as using the dictionary solutions above, but using sets is worthy of consideration for similar problems.

jh314 · Accepted Answer · 2013-07-16 17:39:46Z

2

You can convert the arrays to a dict, and back again.

d1 = dict((x[0],x[1:]) for x in arr1)
d2 = dict((x[0],x[1:]) for x in arr2)
keys = set(d1).union(d2)
n = []
result = dict((k, d1.get(k, n) + d2.get(k, n)) for k in keys)

edited Jul 16, 2013 at 17:39

answered Jul 16, 2013 at 17:13

jh314

27.9k16 gold badges66 silver badges83 bronze badges

1 Comment

Dr. Jan-Philip Gehrcke Over a year ago

Did you try it? For me, this is not the expected output: >>> result [['2011-10-10', 3, 4], ['2007-08-09', 5, 3], ['2007-09-05', 1, 1]]

cwallenpoole · Accepted Answer · 2013-07-16 17:19:56Z

A single dictionary approach:

tmp = {}
# add as many as you like into the outermost array.
for outer in [arr1,arr2]:
    for inner in outer:
        start, rest = inner[0], inner[1:]
        # the list if key exists, else create a new list. Append to the result
        tmp[start] = tmp.get(start,[]) + rest

output = []

for k,v in tmp.iteritems():
   output.append([k] + v)

That would be the equivalent of a full outer join (returns data from both sides even if one side is null). If you wanted an inner join, you might change it to this:

tmp = {}
keys_with_dupes = []

for outer in [arr1,arr2]:
    for inner in outer:
        start, rest = inner[0], inner[1:]
        original = tmp.get(start,[])
        tmp[start] = original + rest
        if original:
            keys_with_dupes.append(start)

output = []

for k in keys_with_dupes:
   v = tmp[k]
   output.append([k] + v)

Rao · Accepted Answer · 2013-07-16 17:32:42Z

1

Generator function approach, skipping corresponding elements whose dates don't match:

import itertools
def gen(a1, a2):
    for x,y in itertools.izip(a1, a2):
        if x[0] == y[0]:
            ret = list(x)
            ret.extend(y[1:])
            yield ret
        else:
            continue

>>print list(gen(arr1, arr2))
[['2011-10-10', 1, 1, 3, 4]]

But yeah, if possible, organize your data differently.

edited Jul 16, 2013 at 17:32

answered Jul 16, 2013 at 17:21

Rao

8921 gold badge8 silver badges20 bronze badges

1 Comment

Blckknght Over a year ago

zip (or izip) only makes sense if the two lists directly correspond. If they don't, you might not find any of the matches.

Dan Lecocq · Accepted Answer · 2013-07-16 17:12:27Z

0

Unless both are very large lists, I'd use a dictionary:

arr1 = [
  ['2011-10-10', 1, 1],
  ['2007-08-09', 5, 3]
]

arr2 = [
  ['2011-10-10', 3, 4],
  ['2007-09-05', 1, 1]
]

table_1 = dict((tup[0], tup[1:]) for tup in arr1)
table_2 = dict((tup[0], tup[1:]) for tup in arr2)
merged = {}
for key, value in table_1.items():
    other = table_2.get(key)
    if other:
        merged[key] = value + other

Otherwise, it would be more efficient to sort each, and then do a merge that way. But I imagine for most purposes this would be fast enough.

answered Jul 16, 2013 at 17:12

Dan Lecocq

3,5131 gold badge28 silver badges22 bronze badges

Collectives™ on Stack Overflow

Combine two array's data using inner join

6 Answers 6

4 Comments

Comments

1 Comment

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

4 Comments

Comments

1 Comment

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related