7

I've two data sets in array:

arr1 = [
  ['2011-10-10', 1, 1],
  ['2007-08-09', 5, 3],
  ...
]

arr2 = [
  ['2011-10-10', 3, 4],
  ['2007-09-05', 1, 1],
  ...
]

I want to combine them into one array like this:

arr3 = [
  ['2011-10-10', 1, 1, 3, 4],
  ...
]

I mean, just combine those lines with the same date column.

Just for clarification, I don't need those lines which not appear in both array, just drop them.

6
  • 1
    Thought about using a dict? Commented Jul 16, 2013 at 17:03
  • 2
    btw, those are lists, not arrays. Commented Jul 16, 2013 at 17:05
  • code.activestate.com/recipes/577937-inner-join Commented Jul 16, 2013 at 17:12
  • 1
    Must ['2007-09-05', 1, 1] still be in the output (in arr3)? Commented Jul 16, 2013 at 17:22
  • pandas is an excellent choice for operations along these lines, if you don't mind a relatively "heavy" dependency: pandas.pydata.org Commented Jul 16, 2013 at 17:26

6 Answers 6

6

Organize your data differently (you can easily convert what you already have to two dicts):

d1 = { '2011-10-10': [1, 1],
       '2007-08-09': [5, 3]
     }
d2 = { '2011-10-10': [3, 4],
       '2007-09-05': [1, 1]
     }

Then:

d3 = { k : d1[k] + d2[k] for k in d1 if k in d2 }
Sign up to request clarification or add additional context in comments.

4 Comments

This will miss those entries with dates that do not occur in both sets.
@Jan-Philip Gehrcke: "I mean, just combine those lines with the same date column."
Jason, yes, he says that he wants those being combined, but he does not say that he wants to miss the other data points. He (or she) needs to clarify.
Well, either way, it's an easy modification for a full-outer join: { k : (d1[k] if k in d1 else []) + (d2[k] if k in d2 else []) for k in set(d1).union(d2) }.
3

It may be worth mentioning set data types. as their methods align to the type of problem. The set operators allow you to join sets easily and flexibly with full, inner, outer, left, right joins. As with dictionaries, sets do not retain order, but if you cast a set back into a list, you may then apply an order on the result join. Alternatively, you could use an ordered dictionary.

set1 = set(x[0] for x in arr1)
set2 = set(x[0] for x in arr2)
resultset = (set1 & set2)

This only gets you the union of dates in the original lists, in order to reconstruct arr3 you would need to append the [1:] data in arr1 and arr2 where the dates are in the result set. This reconstruction would not be as neat as using the dictionary solutions above, but using sets is worthy of consideration for similar problems.

Comments

2

You can convert the arrays to a dict, and back again.

d1 = dict((x[0],x[1:]) for x in arr1)
d2 = dict((x[0],x[1:]) for x in arr2)
keys = set(d1).union(d2)
n = []
result = dict((k, d1.get(k, n) + d2.get(k, n)) for k in keys)

1 Comment

Did you try it? For me, this is not the expected output: >>> result [['2011-10-10', 3, 4], ['2007-08-09', 5, 3], ['2007-09-05', 1, 1]]
1

A single dictionary approach:

tmp = {}
# add as many as you like into the outermost array.
for outer in [arr1,arr2]:
    for inner in outer:
        start, rest = inner[0], inner[1:]
        # the list if key exists, else create a new list. Append to the result
        tmp[start] = tmp.get(start,[]) + rest

output = []

for k,v in tmp.iteritems():
   output.append([k] + v)

That would be the equivalent of a full outer join (returns data from both sides even if one side is null). If you wanted an inner join, you might change it to this:

tmp = {}
keys_with_dupes = []

for outer in [arr1,arr2]:
    for inner in outer:
        start, rest = inner[0], inner[1:]
        original = tmp.get(start,[])
        tmp[start] = original + rest
        if original:
            keys_with_dupes.append(start)

output = []

for k in keys_with_dupes:
   v = tmp[k]
   output.append([k] + v)

Comments

1

Generator function approach, skipping corresponding elements whose dates don't match:

import itertools
def gen(a1, a2):
    for x,y in itertools.izip(a1, a2):
        if x[0] == y[0]:
            ret = list(x)
            ret.extend(y[1:])
            yield ret
        else:
            continue

>>print list(gen(arr1, arr2))
[['2011-10-10', 1, 1, 3, 4]]

But yeah, if possible, organize your data differently.

1 Comment

zip (or izip) only makes sense if the two lists directly correspond. If they don't, you might not find any of the matches.
0

Unless both are very large lists, I'd use a dictionary:

arr1 = [
  ['2011-10-10', 1, 1],
  ['2007-08-09', 5, 3]
]

arr2 = [
  ['2011-10-10', 3, 4],
  ['2007-09-05', 1, 1]
]

table_1 = dict((tup[0], tup[1:]) for tup in arr1)
table_2 = dict((tup[0], tup[1:]) for tup in arr2)
merged = {}
for key, value in table_1.items():
    other = table_2.get(key)
    if other:
        merged[key] = value + other

Otherwise, it would be more efficient to sort each, and then do a merge that way. But I imagine for most purposes this would be fast enough.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.