1

So, I have lists of lists like following:

data = [
['foo', 'bar'],
['one', 'two']
]

And, I want to flatten these lists by alternating between two lists. So, output like

flattened = ['foo', 'one', 'bar', 'two']

I am using the list(chain.from_iterable(zip_longest(*data))) which works fine.

But, I am trying to figure out how to handle scenarios where there are duplicates that I want to get rid of.

data = [
['foo', 'bar'],
['foo', 'two']
]

I want something like

flatted = ['foo', 'two', 'bar'] 

rather than ['foo', 'foo', 'bar', 'two']

How do I do this?

5
  • does order matter? Commented Sep 23, 2019 at 19:19
  • @Marat, well if they want to alternate between lists, then inherently yes. Commented Sep 23, 2019 at 19:21
  • @Marat: yeah... Commented Sep 23, 2019 at 19:22
  • I don't have time at the moment but you can implement an ordereddict from your output list and then convert back or dict in 3.7+ Commented Sep 23, 2019 at 19:33
  • The correct output is ['foo', 'bar', 'bar'] or ['foo', 'two', 'bar']? Commented Sep 23, 2019 at 19:46

4 Answers 4

4

Use a set to keep track of what you've already seen, which is an O(1) membership test.

result = []
seen = set()
for item in chain.from_iterable(zip_longest(*data)):
    if item not in seen:
        seen.add(item)
        result.append(item)
>>> result
['foo', 'bar', 'two']

Note that this question talks about removing duplicates from a list: Removing duplicates in lists

TL;DR

For Python 3.7+ (or Cython 3.6+):

>>> list(dict.fromkeys(chain.from_iterable(zip_longest(*data))))
['foo', 'bar', 'two']
Sign up to request clarification or add additional context in comments.

3 Comments

OP is requesting ['foo', 'two', 'bar'] , not ['foo', 'bar', 'two']
"something like" ['foo', 'two', 'bar']
yes I thought that initially as well but technically only 1 of those lists satisfies the values "alternating between two lists" as OP specified earlier. It's ambiguous at the end of the day so who knows.
0

Hmm, this might be a bit more overhead than you're looking for but this should work and guarantees list-wise order unlike sets:

from itertools import cycle
from collections import Counter

output = []
checker = Counter()

for lst in cycle(data):
    if not data:
        break

    while lst:
        item = lst.pop(0)
        if not checker[item]:
            output.append(item)
            checker[item] += 1
            break            

    if not lst:
        data.remove(lst)
        continue  

output:

['foo', 'two', 'bar']

Comments

0

Try this code:

list(dict.fromkeys(sum(data, [])))

EDIT: As pointed out in the comments, the sum is not the most efficient method to flatten a list, you can use itertools.chain.from_iterable to get the flattened list, then do the following:

list(dict.fromkeys(chain.from_iterable(data)))

In both cases the output is the following:

['foo', 'bar', 'two']

Comparison of execution times

Below they propose a comparison of the execution times of the main proposed solutions:

  1. Benchmark data1:

     data1 = [['foo', 'bar'],['foo', 'two']] * 1000000
    
    1. @Massifox's solution with itertools.chain.from_iterable

      %%timeit
      list(chain.from_iterable(data1))
      # 128 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 
      
    2. @Marat's solution with dedup

      %%timeit
      list(dedup(chain.from_iterable(zip_longest(*data1))))
      # 579 ms ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 
      
    3. @Alexander's solution with zip_longest

      %%timeit
      list(dict.fromkeys(chain.from_iterable(zip_longest(*data1))))
      # 456 ms ± 149 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 
      
  2. Benchmark data2:

    x = 10
    y = 500000
    n_max = 1000
    data2 = [[np.random.randint(1, n_max) for _ in range(0, x)] for _ in range(0, y)]
    
    1. @Massifox's solution with itertools.chain.from_iterable

      %%timeit
      list(chain.from_iterable(data2))
      # 241 ms ± 20 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
      
    2. @Marat's solution with dedup

      %%timeit
      list(dedup(chain.from_iterable(zip_longest(*data2))))
      # 706 ms ± 18.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
      
    3. @Alexander's solution with zip_longest

      %%timeit
      list(dict.fromkeys(chain.from_iterable(zip_longest(*data2))))
      # 674 ms ± 56.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
      

The implementations based on sum and on Counter are decisively less efficient and take tens of seconds already with instances of smaller benchmark of [['foo', 'bar'],['foo', 'two']] * 100k.

With benchmark data1, the solution based on itertools.chain.from_iterable proposed by me seems to be about 4-5 times faster than the others.

With benchmark data2, the solution based on itertools.chain.from_iterable proposed by me seems to be about 2-3 times faster than the others.

9 Comments

This is pretty good but sum is a terribly slow way to flatten a list.
This produces "bar" before it produces "two", OP is requesting that the opposite happens.
@BrianJoseph Probably the author of the post was wrong to expect that output, I'm asking for confirmation in the comments
These timings tests are so wrong... You have four million data points, but only three of which are unique. Furthermore, your timed solution gives the wrong result.
are correct on my machine on that benchmark instance. If you have another instance to offer, write it in the comments that the text and add it to the comparison. Thanks
|
-3

you can create a set and then convert to list again some like this:

l1 = ['foo', 'foo', 'bar', 'two']
l2 = list(set(l1))

it's create a second list without repeated items

if you want keept the order you can do this

ordered = dict.fromkeys(l1).keys()

4 Comments

Will this always maintain order?
@MyNameIsCaleb Sets are unordered but you can probs find a OrderedSet datatype somewhere.
Agree @JoshuaNixon which was my point. Dict does in 3.7+ based on insertion order.
so, you can use same example using ordereddict instance of dict

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.