I was asked to find the total number of substring (case insensitive with/without punctuations) occurrences in a given string. Some examples:
count_occurrences("Text with", "This is an example text with more than +100 lines") # Should return 1
count_occurrences("'example text'", "This is an 'example text' with more than +100 lines") # Should return 1
count_occurrences("more than", "This is an example 'text' with (more than) +100 lines") # Should return 1
count_occurrences("clock", "its 3o'clock in the morning") # Should return 0
I chose regex over .count() as I needed an exact match, and ended up with:
def count_occurrences(word, text):
pattern = f"(?<![a-z])((?<!')|(?<='')){word}(?![a-z])((?!')|(?=''))"
return len(re.findall(pattern, text, re.IGNORECASE))
and I've got every matching count but my code took 0.10secs while expected time is 0.025secs. Am I missing something? is there any better (performance optimised) way to do this?
text.lower().count(word.lower())is much faster. Do you need another regex? Or, you could find messy but more specifically optimized code..countlets saytxt = "texts texts texts'count will return 3 if I search fortextand I dont want that (It needs to return a match only for exact word)