4

Given two lists of strings that contain duplicates save for one element in each list, how would you combine the two into a single list that contains one copy of every value in list order?

For example, given the following two lists in Python:

a = ['Second', 'Third', 'Fourth']
b = ['First', 'Second', 'Third']

Or

a = ['First', 'Third', 'Fourth']
b = ['First', 'Second', 'Third']

How would you combine the two lists to get a single list like this:

result = ['First', 'Second', 'Third', 'Fourth']

Note that the exact values of the strings cannot necessarily be trusted to help with ordering the elements.

I am aware of the possibility that there will be some cases with no definitive way to lock the list down to a particular order, and will probably have to special-case those, but for the general cases I'd rather have a procedure to follow. For example:

a = ['First', 'Third', 'Fourth']
b = ['First', 'Second', 'Fourth']

This could have 'Third' and 'Second' in either order, as there's no item on both lists between them to provide a guideline.

Edit: I should explain the strings a bit further, as I see many of you are assuming that I can merely sort a raw merge of the two lists, and this just isn't going to work.

I'm taking story titles, which, for each story, only list the other instalments and not the linked story itself. So by taking two lists (or possibly more, I'm not sure), I can come up with a full list of the instalments to put them in their proper order.

4
  • Just add the two lists Commented Dec 10, 2013 at 7:57
  • Also, +1 for adding a description of the actual problem you're trying to solve. Commented Dec 10, 2013 at 12:28
  • I'm still thinking about possible solutions, but I'm not sure this problem is solvable in general. What about cases like ['First', 'Second', 'Fourth'] and ['First', 'Third', 'Fourth']? Without knowing the correct order by some other means, there is no way the program can tell whether 'Second' or 'Third' comes first. Commented Dec 10, 2013 at 22:08
  • @jpmc26 The lists would be sufficient as long as there was at least one installment between the installments the lists were taken from. A check to ensure that lists are uniquely mergable would produce either get a correct list, or require more code to handle the special case. Unfortunately the OP hasn't clarified his context and requirements enough to show whether that would constitute an acceptable answer. Commented Dec 11, 2013 at 8:58

6 Answers 6

4

Simple algorythm:

  1. Concat lists
  2. Remove dups
  3. Sort

Code:

def order_list(lst, order_dict):
     return sorted(list(lst), key = lambda x: order_dict.get(x, -1))

c = list(set(a + b))
ord_dict = {"First": 1, "Second": 2, "Third": 3, "Fourth": 4}
order_list(c, ord_dict)
Sign up to request clarification or add additional context in comments.

5 Comments

I think it's a bad idea to return a default of -1 for the sort key. I would want my sorting algorithm to fail fast if an unexpected value turned up, not stick it at the beginning of the list. Additionally, you can't guarantee what would happen if 2 unexpected elements came up.
@jpmc26 that's a valid point, but this is a "business" decision. Maybe he/she just writing a script that would analyze the data set and a few error data is OK. You could add a counter to know how many are invalid and then just splice the correct result.
@jpmc26 BTW your code works the same way. dict.get returns None which is 0.
This solution isn't going to work, as I can't rely upon the strings to be sortable. See the updated description for details.
@lukas Boo. Thanks. My mistake. Fixing.
4

You have 2 different concerns here:

  • Duplicate elimination
  • Ordering

I would do them separately. Duplication elimination is simple enough. Use a set:

>>> a = ['Second', 'Third', 'Fourth']
>>> b = ['First', 'Second', 'Third']
>>> x = set(a)
>>> x
set(['Second', 'Fourth', 'Third'])
>>> x.update(b)
>>> x
set(['Second', 'Fourth', 'Third', 'First'])

Then you'll need to a define the ordering somehow. The simplest way to do that might be to map each possible element to a value:

>>> order_dict = {'First': 1, 'Second': 2, 'Third': 3, 'Fourth': 4}
>>> result = sorted(list(x), key=lambda i: order_dict[i])
>>> result
['First', 'Second', 'Third', 'Fourth']

Alternatively, you could use some kind of compare function with sorted's cmp argument if you can define one for your values.

Hope this helps.

2 Comments

very nice. +1 for not making me open a new question
This solution isn't going to work, as I can't rely upon the strings to be sortable. See the updated description for details.
2

If we assume that your two lists are both ordered, and that they are each missing only some elements from the full set, then I can kind of see an algorithm that should work most of the time.

  1. Take the next index in A.
  2. Step through B looking for a match:
    1. If there was a match:
      • Remove everything from the start of B up to and including the match in B, and add to C
    2. If there was no match:
      • Add index A to C
  3. Repeat
  4. If there's anything left in B, add it to C.

This is the python code for the algorithm:

a1 = ['Second', 'Third', 'Fourth']
b1 = ['First', 'Second', 'Third']

a2 = ['First', 'Third', 'Fourth']
b2 = ['First', 'Second', 'Third']

a3 = ['First', 'Third', 'Fourth']
b3 = ['First', 'Second', 'Fourth']

def merge(a, b):
    c = []
    b_oldindex = 0
    for a_index in range(len(a)):
        match = False
        for b_index in range(b_oldindex, len(b)):
            if a[a_index] == b[b_index]:
                c.extend(b[b_oldindex:b_index+1])
                b_oldindex = b_index + 1
                match = True
                break
        if not match:
            c.append(a[a_index])
    if b_oldindex < len(b):
        c.extend(b[b_oldindex:])
    return c

print(merge(a1,b1))
print(merge(a2,b2))
print(merge(a3,b3))
print(merge(b1,a1))
print(merge(b2,a2))
print(merge(b3,a3))

Which produces the following output:

['First', 'Second', 'Third', 'Fourth']
['First', 'Second', 'Third', 'Fourth']
['First', 'Third', 'Second', 'Fourth']
['First', 'Second', 'Third', 'Fourth']
['First', 'Second', 'Third', 'Fourth']
['First', 'Second', 'Third', 'Fourth']

In all of test cases, the only one that fails to produce the correct order is merge(a3,b3).

Solving the problem completely may involve implementing a correct merge algorithm (as used in merge sort), which requires the ability to evaluate the order that elements should be in. You can see a python implementation of merge sort at Rosetta code.

UPDATE:

Given that this is actually to sort the installments in a set of books, you can avoid situations you described in your third set of data by taking additional information into account. Namely, use the merge function on lists in the reverse order of copyright or publication date.

For example, in your case:

a3 = ['First', 'Third', 'Fourth']  # Second novel
b3 = ['First', 'Second', 'Fourth'] # Third novel

a3's book would have been published before b3's book. If you can harvest that kind of metadata, then you could avoid this issue.

Copyright date won't differ between different editions of the same book, but publication date might. Therefore, I'd look at copyright date before publication date.

Comments

1

The set container is defined by having no duplicates in it. You can make a set of both of the lists and then cast it back to list type:

a = ['Second', 'Third', 'Fourth']
b = ['First', 'Second', 'Third']
c= list(set(a+b))
['Second', 'Fourth', 'Third', 'First']
#Note that set will not organize anything, it will just delete the duplicates

1 Comment

Unfortunately the order is important. This code will determine where a relatively huge block of text is going to go, and reordering things by hand is going to be a pretty big problem.
1

I had the same issue, and I have an answer. I found this post because I was searching for more pythonic ways of doing it.

First, a note about the special case:

a=['A','C','D','E']
b=['A','B','D','F']
c=joinListsOrdered(a,b)

in my case I do not have any problem: ['A','B','C','D','E','F'] is as good as ['A','C','B','D','F','E']. The only validation condition I want is: the order of elements in c respects the order in a and b separately, i.e. [el for el in c if el in a] is element-wise equal to a (and equivalently to b). I also think this is the only reasonable stance on this problem without further information about the problem.

This translate in saying: the focus is about the common elements (['A', 'D']). If those are in the proper order, everything else, can be easily stuck in the middle. Therefore, this algorithm:

def joinListsOrdered(a,b):
    # Find ORDERED common elements
    order={}
    for i, e in enumerate(a):
        order[e]=i
    commonElements=sorted(set(a) & set(b), key=lambda i: order[i])
    # Cycle on each common element.
    i=0 #index of a
    j=0 #index of b
    c=[]
    for comEl in commonElements:
       while not a[i]==comEl:
           c.append(a[i])
           i=i+1
       while not b[j]==comEl:
           c.append(b[j])
           j=j+1
       c.append(comEl)
       i=i+1;j=j+1
    # Add the eventual residuals after the last common element.
    c=c+a[i:]+b[j:]
    return c

Of course it fails to respect the validation condition if the order in a and b for some common element is different, but in that case the problem does not have a solution.

Comments

0

In the most simple where there is only one element that is different and it's in the same position just a iterate joinly though both strings

newlist = []
for i in range(len(a)):
  if a[i] == b[i]:
    newlist.append(a)
  else:
    newlist.append(a)
    newlist.append(b)

If your lists are more complicate turn one of them into a dictionary first and check against the other when merging.

1 Comment

This isn't going to work, even with the test cases I provided above. You're assuming that duplicate elements are going to be in the same slots in the different arrays, and that is simply not the case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.