1

I'm following the post Search a list of list of strings for a list of strings in python efficiently and trying to search for a list of substrings in a list of list of strings. The above post finds the index of the list of strings that match the list of strings. In my code, I substring the L1 and flatten it to match the L2 string. How do I get a list of all the L1 strings that have L2 strings as substrings? Right now, I'm getting the index of the L1 list of strings that match each L2 string.

This is how far I got. The code that I'm following:

from bisect import bisect_left, bisect_right
from itertools import chain
    
L1=[["animal:cat","pet:dog","fruit:apple"],["fruit:orange","color:green","color:red","fruit:apple"]]
L2=["apple", "cat","red"]

M1 = [[i]*len(j) for i, j in enumerate(L1)]
M1 = list(chain(*M1))

L1flat = list(chain(*L1))

I = sorted(range(len(L1flat)), key=L1flat.__getitem__)
L1flat = sorted([L1flat[i].split(':')[1] for i in I])
print(L1flat)
M1 = [M1[i] for i in I]

for item in L2:
    s = bisect_left(L1flat, item)
    e = bisect_right(L1flat, item)    
    print(item, M1[s:e])
    #print(L1flat[s:e])
    sub = M1[s:e]
    for y in sub:
        print('%s found in %s' % (item, str(L1(y))))

Edit: I just realized I'm getting errors in my search for second and third item.

3 things:

  1. I created the M1 by enumerating split elements of L1

    L1Splitted = [i[0].split(':')[1] for i in L1]

    M1 = [[i]*len(j) for i, j in enumerate(L1Splitted)]

  2. I reversed the elements in L1flat and split the elements

    L1flatReversed = []

    for j, x in enumerate(L1flat)

     L1flatReversed.append(reverseString(x, ':'))
    
  3. Then I made another list of reversed strings split

    L1flatReversedSplit = [L1flatReversed[i].split(':')[0] for i in I]

now my s and e are bisecting on L1flatReversedSplit

4
  • What do you want as output here? Commented Jan 17, 2024 at 17:11
  • I look for the list element in L1 that each L2 element falls into Commented Jan 17, 2024 at 17:15
  • So for L2 --> "cat" do you want L1 --> ["animal:cat", "pet:dog", "fruit:apple"] or L1 --> "animal:cat" or something else? Commented Jan 17, 2024 at 17:18
  • 1
    the first one: L1-->["animal:cat", "pet:dog", "fruit:apple"] Commented Jan 17, 2024 at 17:19

2 Answers 2

0

I would map L1 into something we can search against given that we seem to know where in the values of L1 we might find our search terms. It is then just a matter of intersecting the set of words with our search terms

L1 = [
    ["animal:cat", "pet:dog", "fruit:apple"],
    ["fruit:orange", "color:green", "color:red", "fruit:apple"]
]
L2 = ["apple", "cat", "red"]

## ------------------
## Create a reshaping of "L1" based on where we know we can find
## strings to match
## ------------------
L1_lookup = [
    set(cell.split(":")[1] for cell in row)
    for row in L1
]
## ------------------

## ------------------
## match the set of words against the search words
## ------------------
results = [
    (L1[index], intersection)
    for index, value
    in enumerate(L1_lookup)
    if (intersection := value.intersection(L2))
]
## ------------------

for row, intersection in results:
    print(row, f"{intersection=}")

Should give you:

['animal:cat', 'pet:dog', 'fruit:apple'] intersection={'apple', 'cat'}
['fruit:orange', 'color:green', 'color:red', 'fruit:apple'] intersection={'red', 'apple'}
Sign up to request clarification or add additional context in comments.

1 Comment

I'm trying to find a match for each element of L2 in a pseudo database structure L1. More or less, your solution gives all the L1 elements that satisfy each or many L2 elements. I could use this solution to formulate an sql query and send it to the database.
0

3 things:

  1. I created the M1 by enumerating split elements of L1

    L1Splitted = [i[0].split(':')[1] for i in L1]

    M1 = [[i]*len(j) for i, j in enumerate(L1Splitted)]

  2. I reversed the elements in L1flat and split the elements

    L1flatReversed = []

    for j, x in enumerate(L1flat)

     L1flatReversed.append(reverseString(x, ':'))
    
  3. Then I made another list of reversed strings split

    L1flatReversedSplit = [L1flatReversed[i].split(':')[0] for i in I]

now my s and e are bisecting on L1flatReversedSplit

2 Comments

If anyone has a better solution to doing this, please let me know. This works for now because my list of list of strings has minimal elements.
I added this info to the main question and you can probably delete this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.