3

I have the following problem and just can't find a solution.

I have to find the exact start and end position of the following substring:

"hello world is a good idea for a T-shirt"

in any possible other String such as:

"This is some string, that includes commas, and other punctuations. It also includes hello world, is a, good, idea for a T-shirt and other."

due to the punctuations (commas) find() won't give me a result. I am trying to use regex such as r"(Hello)[\W+] (world) [\W+]..." but it does not work either. Any good ideas?

Edit:

Here is my code:

import re
text = "This is some string, that includes commas, and other punctuations. It also includes hello world, is a, good, idea for a T-shirt and other."
match = re.search(r"[\W+](hello)[\W+](world)[\W+](is)[\W+](a)[\W+](good)[\W+](idea)[\W+](for)[\W+](a)[\W+](T-shirt)", text)
print (match)
5
  • 1
    Please share the full code. Your approach should work: replace each space with \W+ pattern. Commented Aug 1, 2017 at 14:05
  • Hello Wiktor, thank you very much for your very quick reply! It works, as long as there are no commas. Commented Aug 1, 2017 at 14:14
  • @SirTobi try my answer it will match any thing even commas. Commented Aug 1, 2017 at 14:15
  • @Mohamed I said \W+, not [\W+] Commented Aug 1, 2017 at 14:19
  • @Mohamed perfect! Thank you :D Now it works! Thanks, everyone! You are awesome. Commented Aug 1, 2017 at 14:22

2 Answers 2

1

When you use [\W+], you create a character class that matches a single character, either a non-word char (any char that is not a letter, digit or _) OR a literal + symbol.

Use \W+ instead of spaces:

import re
text = "This is some string, that includes commas, and other punctuations. It also includes hello world, is a, good, idea for a T-shirt and other."
match = re.search(r"hello\W+world\W+is\W+a\W+good\W+idea\W+for\W+a\W+T-shirt", text)
if match:
    print("YES!")

See the Python demo

The \W matches any char that is not a letter, digit or _ char and + makes the regex engine match 1 or more occurrences of these chars.

To make the code more generic, you can split the initial string with space, and then join with regex pattern to match space or comma or dot.

import re
key = "hello world is a good idea for a T-shirt"
pat = r"\W+".join([re.escape(x) for x in key.split()])
# print(pat) # => hello\W+world\W+is\W+a\W+good\W+idea\W+for\W+a\W+T\-shirt
text = "This is some string, that includes commas, and other punctuations. It also includes hello world, is a, good, idea for a T-shirt and other."
match = re.search(pat, text)
if match:
    print("YES!")

See another Python demo

Sign up to request clarification or add additional context in comments.

4 Comments

ah right okay I think I understand! Awesome, thank you very much! This works perfectly. I am amazed by how fast you guys are. Thumbs up!
…and it the phrase being searched is not known a priori but rather comes from user input, it might be a good idea to construct the regexp dynamically, like: regexp = r'\W+'.join(search_string.split())
@Błotosmętek Ah perfect, yes that's what I was gonna do, but your solution is much more elegant than what I had in mind :D. Thank you very much!
Well, then I'd also re.escape each chunk.
1

try this:

 r'\bhello.*T-shirt\b'

1 Comment

Hi Mohamed, awesome, thank you very much for your fast reply, this works indeed very well. However, I expressed myself a bit ambiguously, what I am trying to match are the same words. So in your example, it would also match a completely different sentence. But it is cool nonetheless :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.