Search a list of list of strings for a list of strings in python efficiently

Question

I have a list of list of strings and a list of strings. for example:

L1=[["cat","dog","apple"],["orange","green","red"]]
L2=["cat","red"]

if L1[i] contains any item from L2 I need to put the pairs (for creating edges in a graph) like, in my example, I need the pairs ("cat","dog"),("cat,apple"),("red,orange"),("red","green")

What approach should I use to make it most efficient. (My list L1 is huge)

Did you try it straight-forward (and maybe less efficient)?

alex
– alex

2012-02-16 18:35:44 +00:00
Commented Feb 16, 2012 at 18:35 — alex
– alex, Commented Feb 16, 2012 at 18:35

Rik Poggi · Accepted Answer · 2012-02-16 19:16:47Z

2

Supposing that you might have, more than one "control" item in your L1 sublists.

I'd do it using set() and itertools.product():

from itertools import product

def generate_edges(iterable, control):
    edges = []
    control_set = set(control)
    for e in iterable:
        e_set = set(e)
        common = e_set & control_set
        to_pair = e_set - common
        edges.extend(product(to_pair, common))
    return edges

Example:

>>> L1 = [["cat","dog","apple"],
...       ["orange","green","red"],
...       ["hand","cat","red"]]
>>> L2 = ["cat","red"]
>>> generate_edges(L1, L2)
[('apple', 'cat'),
 ('dog', 'cat'),
 ('orange', 'red'),
 ('green', 'red'),
 ('hand', 'red'),
 ('hand', 'cat')]

answered Feb 16, 2012 at 19:16

Rik Poggi

29.5k7 gold badges69 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Bi Rico · Accepted Answer · 2012-02-16 20:08:21Z

2

If L1 is very large you might want to look into using bisect. It requires that yo flatten and sort L1 first. You could do something like:

from bisect import bisect_left, bisect_right
from itertools import chain

L1=[["cat","dog","apple"],["orange","green","red","apple"]]
L2=["apple", "cat","red"]

M1 = [[i]*len(j) for i, j in enumerate(L1)]
M1 = list(chain(*M1))
L1flat = list(chain(*L1))
I = sorted(range(len(L1flat)), key=L1flat.__getitem__)
L1flat = [L1flat[i] for i in I]
M1 = [M1[i] for i in I]

for item in L2:
    s = bisect_left(L1flat, item)
    e = bisect_right(L1flat, item)
    print item, M1[s:e]

#apple [0, 1]
#cat [0]
#red [1]

answered Feb 16, 2012 at 20:08

Bi Rico

25.9k3 gold badges57 silver badges75 bronze badges

1 Comment

SoftwareDveloper Over a year ago

Thank you for the code; this is something similar to what I'm trying to do. I just have a question: how do I get a list of all the L1 strings that have L2 strings as substrings. In my case, L2 strings are substrings of L1 strings.

Amber · Accepted Answer · 2012-02-16 18:36:24Z

1

I'd suggest transforming them all to sets and using set operations (intersection) to figure out what terms from L2 are in each L1 item. You can then use set subtraction to get the list of items you need to pair.

edges = []
L2set = set(L2)
for L1item in L1:
    L1set = set(L1item)
    items_in_L1item = L1set & L2set
    for item in items_in_L1item:
        items_to_pair = L1set - set([item])
        edges.extend((item, i) for i in items_to_pair)

answered Feb 16, 2012 at 18:36

Amber

531k89 gold badges643 silver badges558 bronze badges

Comments

juliomalegria · Accepted Answer · 2012-02-16 18:42:19Z

1

To make this code optimal even if L1 and L2 are huge, use izip that produces a generator instead of creating a huge list of tuples. If you're working in Python3, just use zip.

from itertools import izip

pairs = []
for my_list, elem in izip(L1, L2):
    if elem in my_list:
        pairs += [(elem, e) for e in my_list if e!=elem]
print pairs

The code is very comprehesible, it's almost pure english! First, you're looping over each list and its corresponding element, then you're asking if the element is inside the list, if it is, print all pairs except the pair (x, x).

Output:

[('cat', 'dog'), ('cat', 'apple'), ('red', 'orange'), ('red', 'green')]

edited Feb 16, 2012 at 18:42

answered Feb 16, 2012 at 18:37

juliomalegria

25k14 gold badges77 silver badges89 bronze badges

2 Comments

Amber Over a year ago

Your code does not work for the general case the OP is looking for.

Amber Over a year ago

Because any of the elements from L2 may be in any of the lists from L1, and L1 may not have the same number of lists in it as L2 has elements.

Collectives™ on Stack Overflow

Search a list of list of strings for a list of strings in python efficiently

4 Answers 4

Comments

1 Comment

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related