The asymptotic time complexity of your implementation is:
$$O(\ (s-t)*(t*log(t))\ )$$
Because you sort and compare \$s-t\$ substrings of length \$t\$.
(I'm assuming Python uses \$n*log(n)\$ sort).
Here is a solution with \$\ O(\ t*(s-t+log(t))\ ) \ \$ time complexity.
It uses a sorted sliding window of length \$t\$. When the window slides through s, an old character is removed and a new one is added into a specific position, such that the window is kept sorted.
- Initial sorting of string
t is \$\ t*log(t)\$
- Initial sorting of the sliding window is \$\ t*log(t)\$
- Initial comparison of
sorted(t) with the window is \$\ t\$
- For each character in
s[len(t):], total of \$\ s-t\ \$ items, there is:
- one bisection of window for the character to be removed : \$\ log(t)\$
- one removal of a character : \$\ t\$
- one bisection of window for the character to be inserted : \$\ log(t)\$
- one insertion of a character : \$\ t\$
- one comparison of
sorted(t) with the window : \$\ t\$
So the asymptotic complexity is \$O(\ t*log(t)\ +\ (s-t)*t\ )\$,
or simply \$O(\ t*(s-t+log(t))\ )\$.
from bisect import bisect_left, bisect_right
def question1_fast(t, s):
if len(s) < len(t):
return False
t = sorted(t)
window = sorted(s[:len(t)])
if window == t:
return True
for i, c_new in enumerate(s[len(t):]):
c_old = s[i]
if c_old < c_new:
idx_old = bisect_right(window, c_old) - 1
idx_new = bisect_left(window, c_new) - 1
elif c_new < c_old:
idx_new = bisect_right(window, c_new)
idx_old = bisect_left(window, c_old)
else:
continue
del window[idx_old]
window.insert(idx_new, c_new)
if window == t:
return True
return False
A little comparison of efficiency on large s and t:
>>> from random import randint, shuffle
>>>
>>> # `s` is a string of 10,000 random characters
>>> s = ''.join(chr(randint(32,127)) for _ in range(10000))
>>>
>>> # `t` is a random permutation of `s[5000:6000]`
>>> t_list = list(s[5000:6000])
>>> shuffle(t_list)
>>> t = ''.join(t_list)
>>>
>>> timeit('Question1_original(t,s)', globals=locals(), number=10)
25.469552749997092
>>> timeit('Question1_Caridorc(t,s)', globals=locals(), number=10)
12.726046560999748
>>> timeit('question1_fast(t,s)', globals=locals(), number=10)
0.10736406500291196
With small strings, the differences are not so significant:
>>> timeit('Question1_original("ad", "udacity")', globals=locals())
4.730070723002427
>>> timeit('Question1_Caridorc("ad", "udacity")', globals=locals())
6.0496803390014975
>>> timeit('question1_fast("ad", "udacity")', globals=locals())
3.89388234800208
EDIT
Scratch that. I just realized there's a more efficient way to do it. One that is \$\ O(s+t)\ \$ in time complexity.
My original idea was to have a window which is a sorted list, and keeping the window sorted the whole time, all for the sake of the comparison window == t. If window and t weren't both sorted, the comparison obviously wouldn't work as needed.
But to find out whether one string is an anagram of another string, we don't have to sort them, we just need to know if they contain the same characters. Sorting and comparing them is one way. A more efficient way is to count the number of occurrences of each distinct character. The result would be a dictionary mapping to each character the number of its occurrences. Then you just compare the dictionary of str1 with the dictionary of str2, and if they are equal, the strings are anagrams of each other.
The comparison of dictionaries is not much more efficient than comparing sorted lists. If anything, I suspect it might be even slower. It's the insertions and deletions, which are nearly \$O(1)\$ in dictionary, but \$O(t)\$ in the case of a sorted list.
Since neither t nor window have to be sorted, neither be kept sorted, the time complexity is drastically reduced:
- Creation of dictionary for
t : \$\ t\$
- Creation of dictionary for window : \$\ t\$
- For each character in
s[len(t):] (repeated \$s-t\$ times):
- one decrementation of counter : \$\ 1\$
- one incrementation of counter : \$\ 1\$
That adds up to \$\ O(\ t+(s-t)\ )\ \$ or just \$\ O(s+t)\$.
def make_counter_dict(s, t):
counter_dict = dict.fromkeys(s, 0)
for c in t:
counter_dict[c] += 1
return counter_dict
def question1_even_faster(t, s):
if len(s) < len(t):
return False
t_dict = make_counter_dict(s, t)
window = make_counter_dict(s, s[:len(t)])
for c_old, c_new in zip(s, s[len(t):]):
if window == t_dict:
return True
window[c_new] += 1
window[c_old] -= 1
return window == t_dict
Performance (measured with another random s and t of the same parameters as before):
>>> timeit('question1_fast(t,s)', globals=locals(), number=1000)
10.559728353000537
>>> timeit('question1_even_faster(t,s)', globals=locals(), number=1000)
2.1034470079976018
This version is slower on small strings though:
>>> timeit('question1_fast("ad", "udacity")', globals=locals())
3.886769873002777
>>> timeit('question1_even_faster("ad", "udacity")', globals=locals())
4.691746060998412