6

I'm trying to find the location of a substring within a string that contains wildcards. For example:

substring = 'ABCDEF'
large_string = 'QQQQQABC.EFQQQQQ'

start = string.find(substring, large_string)
print(start)

5

thank you in advance

1

4 Answers 4

2

The idea is to convert what you are looking for, ABCDEF in this case, into the following regular expression:

([A]|\.)([B]|\.)([C]|\.)([D]|\.)([E]|\.)([F]|\.)

Each character is placed in [] in case it turns out to be a regex special character. The only complication is if one of the search characters is ^, as in ABCDEF^. The ^ character should just be escaped and is therefore handled specially.

Then you search the string for that pattern using re.search:

import re

substring = 'ABCDEF'
large_string = 'QQQQQABC.EF^QQQQQ'

new_substring = re.sub(r'([^^])', r'([\1]|\\.)', substring)
new_substring = re.sub(r'\^', r'(\\^|\\.)', new_substring)
print(new_substring)
regex = re.compile(new_substring)
m = regex.search(large_string)
if (m):
    print(m.span())

Prints:

([A]|\.)([B]|\.)([C]|\.)([D]|\.)([E]|\.)([F]|\.)
(5, 11)
Sign up to request clarification or add additional context in comments.

Comments

0

Not sure if there is a regex operation for this, but you can generate a list of regex patterns that will work.

substring = "ABCDE"
patterns = []
for i in range(len(substring)):
    patterns.append(string[:i]+'.?' + string[i:])

This gives you the following output in our example:

.?abcde
a.?bcde
ab.?cde
abc.?de
abcd.?e

With this list you can now find the index

for pattern in patterns:
   try:
      print("Index is" + re.search(pattern,substring).start())
      break
   excpect AttributeError:
      pass
else:
   print("Not found")
```python

Comments

0

my try:

from itertools import combinations

def gen_wild_cards(string):
    list_ = []
    start_indexes = [i for i in range(len(string))]
    for i in range(1, len(string)):
        combs = [v for v in combinations(start_indexes, i)]
        for c in combs:
            new_string = list(string)
            for index in c:
                new_string[index] = "."
            list_.append("".join(new_string))
    return list_

large_string = 'QQQQQABC.EFQQQQQ'
basic_string = "ABCDEF"
list_ = gen_wild_cards(basic_string)
for wildcard in list_:
    print(large_string.find(wildcard))

basically I am generating all the wildcards and searching all of them trough the large_string. The wildcard generated:

.BCDEF
A.CDEF
AB.DEF
ABC.EF
ABCD.F
ABCDE.
..CDEF
.B.DEF
.BC.EF
.BCD.F
.BCDE.
A..DEF
A.C.EF
A.CD.F
A.CDE.
AB..EF
AB.D.F
AB.DE.
ABC..F
ABC.E.
ABCD..
...DEF
..C.EF
..CD.F
..CDE.
.B..EF
.B.D.F
.B.DE.
.BC..F
.BC.E.
.BCD..
A...EF
A..D.F
A..DE.
A.C..F
A.C.E.
A.CD..
AB...F
AB..E.
AB.D..
ABC...
....EF
...D.F
...DE.
..C..F
..C.E.
..CD..
.B...F
.B..E.
.B.D..
.BC...
A....F
A...E.
A..D..
A.C...
AB....
.....F
....E.
...D..
..C...
.B....
A.....

If you are interested only in the first match, you could use a lazy approach with a generator instead of generating all the wildcards in one shot

Comments

-1

You can use index() or .start() from re

index = large_string.index(substring)
print(index)
index = re.search(substring, large_string).start()
print(index)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.