4

I have a list of list of strings and a list of strings. for example:

L1=[["cat","dog","apple"],["orange","green","red"]]
L2=["cat","red"]

if L1[i] contains any item from L2 I need to put the pairs (for creating edges in a graph) like, in my example, I need the pairs ("cat","dog"),("cat,apple"),("red,orange"),("red","green")

What approach should I use to make it most efficient. (My list L1 is huge)

1
  • Did you try it straight-forward (and maybe less efficient)? Commented Feb 16, 2012 at 18:35

4 Answers 4

2

Supposing that you might have, more than one "control" item in your L1 sublists.

I'd do it using set() and itertools.product():

from itertools import product

def generate_edges(iterable, control):
    edges = []
    control_set = set(control)
    for e in iterable:
        e_set = set(e)
        common = e_set & control_set
        to_pair = e_set - common
        edges.extend(product(to_pair, common))
    return edges

Example:

>>> L1 = [["cat","dog","apple"],
...       ["orange","green","red"],
...       ["hand","cat","red"]]
>>> L2 = ["cat","red"]
>>> generate_edges(L1, L2)
[('apple', 'cat'),
 ('dog', 'cat'),
 ('orange', 'red'),
 ('green', 'red'),
 ('hand', 'red'),
 ('hand', 'cat')]
Sign up to request clarification or add additional context in comments.

Comments

2

If L1 is very large you might want to look into using bisect. It requires that yo flatten and sort L1 first. You could do something like:

from bisect import bisect_left, bisect_right
from itertools import chain

L1=[["cat","dog","apple"],["orange","green","red","apple"]]
L2=["apple", "cat","red"]

M1 = [[i]*len(j) for i, j in enumerate(L1)]
M1 = list(chain(*M1))
L1flat = list(chain(*L1))
I = sorted(range(len(L1flat)), key=L1flat.__getitem__)
L1flat = [L1flat[i] for i in I]
M1 = [M1[i] for i in I]

for item in L2:
    s = bisect_left(L1flat, item)
    e = bisect_right(L1flat, item)
    print item, M1[s:e]

#apple [0, 1]
#cat [0]
#red [1]

1 Comment

Thank you for the code; this is something similar to what I'm trying to do. I just have a question: how do I get a list of all the L1 strings that have L2 strings as substrings. In my case, L2 strings are substrings of L1 strings.
1

I'd suggest transforming them all to sets and using set operations (intersection) to figure out what terms from L2 are in each L1 item. You can then use set subtraction to get the list of items you need to pair.

edges = []
L2set = set(L2)
for L1item in L1:
    L1set = set(L1item)
    items_in_L1item = L1set & L2set
    for item in items_in_L1item:
        items_to_pair = L1set - set([item])
        edges.extend((item, i) for i in items_to_pair)

Comments

1

To make this code optimal even if L1 and L2 are huge, use izip that produces a generator instead of creating a huge list of tuples. If you're working in Python3, just use zip.

from itertools import izip

pairs = []
for my_list, elem in izip(L1, L2):
    if elem in my_list:
        pairs += [(elem, e) for e in my_list if e!=elem]
print pairs

The code is very comprehesible, it's almost pure english! First, you're looping over each list and its corresponding element, then you're asking if the element is inside the list, if it is, print all pairs except the pair (x, x).

Output:

[('cat', 'dog'), ('cat', 'apple'), ('red', 'orange'), ('red', 'green')]

2 Comments

Your code does not work for the general case the OP is looking for.
Because any of the elements from L2 may be in any of the lists from L1, and L1 may not have the same number of lists in it as L2 has elements.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.