Python how to match substring in slightly different string

Question

I have the following problem and just can't find a solution.

I have to find the exact start and end position of the following substring:

"hello world is a good idea for a T-shirt"

in any possible other String such as:

"This is some string, that includes commas, and other punctuations. It also includes hello world, is a, good, idea for a T-shirt and other."

due to the punctuations (commas) find() won't give me a result. I am trying to use regex such as r"(Hello)[\W+] (world) [\W+]..." but it does not work either. Any good ideas?

Edit:

Here is my code:

import re
text = "This is some string, that includes commas, and other punctuations. It also includes hello world, is a, good, idea for a T-shirt and other."
match = re.search(r"[\W+](hello)[\W+](world)[\W+](is)[\W+](a)[\W+](good)[\W+](idea)[\W+](for)[\W+](a)[\W+](T-shirt)", text)
print (match)

Please share the full code. Your approach should work: replace each space with \W+ pattern. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Aug 1, 2017 at 14:05
Hello Wiktor, thank you very much for your very quick reply! It works, as long as there are no commas. — SirTobi
– SirTobi, Commented Aug 1, 2017 at 14:14
@Mohamed perfect! Thank you :D Now it works! Thanks, everyone! You are awesome. — SirTobi
– SirTobi, Commented Aug 1, 2017 at 14:22

Wiktor Stribiżew · Accepted Answer · 2017-08-01 14:38:19Z

1

When you use [\W+], you create a character class that matches a single character, either a non-word char (any char that is not a letter, digit or _) OR a literal + symbol.

Use \W+ instead of spaces:

import re
text = "This is some string, that includes commas, and other punctuations. It also includes hello world, is a, good, idea for a T-shirt and other."
match = re.search(r"hello\W+world\W+is\W+a\W+good\W+idea\W+for\W+a\W+T-shirt", text)
if match:
    print("YES!")

See the Python demo

The \W matches any char that is not a letter, digit or _ char and + makes the regex engine match 1 or more occurrences of these chars.

To make the code more generic, you can split the initial string with space, and then join with regex pattern to match space or comma or dot.

import re
key = "hello world is a good idea for a T-shirt"
pat = r"\W+".join([re.escape(x) for x in key.split()])
# print(pat) # => hello\W+world\W+is\W+a\W+good\W+idea\W+for\W+a\W+T\-shirt
text = "This is some string, that includes commas, and other punctuations. It also includes hello world, is a, good, idea for a T-shirt and other."
match = re.search(pat, text)
if match:
    print("YES!")

See another Python demo

edited Aug 1, 2017 at 14:38

answered Aug 1, 2017 at 14:19

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

SirTobi Over a year ago

ah right okay I think I understand! Awesome, thank you very much! This works perfectly. I am amazed by how fast you guys are. Thumbs up!

Błotosmętek Over a year ago

…and it the phrase being searched is not known a priori but rather comes from user input, it might be a good idea to construct the regexp dynamically, like: regexp = r'\W+'.join(search_string.split())

SirTobi Over a year ago

@Błotosmętek Ah perfect, yes that's what I was gonna do, but your solution is much more elegant than what I had in mind :D. Thank you very much!

Wiktor Stribiżew Over a year ago

Well, then I'd also re.escape each chunk.

0xMH · Accepted Answer · 2017-08-01 14:13:15Z

1

try this:

 r'\bhello.*T-shirt\b'

answered Aug 1, 2017 at 14:13

0xMH

2,16024 silver badges29 bronze badges

1 Comment

SirTobi Over a year ago

Hi Mohamed, awesome, thank you very much for your fast reply, this works indeed very well. However, I expressed myself a bit ambiguously, what I am trying to match are the same words. So in your example, it would also match a completely different sentence. But it is cool nonetheless :)

Collectives™ on Stack Overflow

Python how to match substring in slightly different string

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related