0

I am trying to analysis time complexity of below function. This function is used to check if a string is made of other strings.

set<string> s; // s has been initialized and stores all the strings
bool fun(string word) {
    int len = word.size();

    // something else that can also return true or false with O(1) complexity

    for (int i=1; i<=len; ++i) {
       string prefix = word.substr(0,i);
       string suffix = word.substr(i);
       if (prefix in s && fun(suffix))
           return true;
       else
           return false;
    }
}

I think the time complexity is O(n) where n is the length of word (am I right?). But as the recursion is inside the loop, I don't know how to prove it.

Edit:

This code is not a correct C++ code (e.g., prefix in s). I just show the idea of this function, and want to know how to analysis its time complexity

4
  • I doubt that it's O(n). The outer loop executes n times, and each one calls the function again. I'm thinking something bad, like O(2^n). And could you explain what you mean by a string being made of other strings? Commented Aug 25, 2013 at 17:22
  • @Teepeemm like word1 = abcdef, word2 = abc, word3 = def. word1 is made of word2 and word3 Commented Aug 25, 2013 at 17:25
  • This code looks buggy. Shouldn't the loop start at i=1? As it is, unless s contains the empty string, this will always return false immediately, and if s does contain the empty string, it will recurse infintely. Also, unless s is initialized with something, prefix in s will always evaluate to false. That would also make the function return false immediately. Commented Aug 25, 2013 at 17:34
  • @TedHopp Yes, this code is not correct. I just want to show the idea of this function and decide its complexity Commented Aug 25, 2013 at 17:56

2 Answers 2

4

The way to analyze this is by developing a recursion relationship based on the length of the input and the (unknown) probability that a prefix is in s. Let's assume that the probability of a prefix being in s is given by some function pr(L) of the length L of the prefix. Let the complexity (number of operations) be given by T(len).

If len == 0 (word is the empty string), then T = 1. (The function is missing a final return statement after the loop, but we're assuming that the actual code is only a sketch of the idea, not what's actually executing).

For each loop iteration, denote the loop body complexity by T(len; i). If the prefix is not in s, then the body has constant complexity (T(len; i) = 1). This event has probability 1 - pr(i).

If the prefix is in s, then the function returns true or false according to the recursive call to fun(suffix), which has complexity T(len - i). This event has probability pr(i).

So for each value of i, the loop body complexity is:

T(len; i) = 1 * (1 - pr(i)) + T(len - i) * pr(i)

Finally (and this depends on the intended logic, not the posted code), we have

T(len) = sum i=1...len(T(len; i))

For simplicity, let's treat pr(i) as a constant function with value 0.5. Then the recursive relationship for T(len) is (up to a constant factor, which is unimportant for O() calculations):

T(len) = sum i=1...len(1 + T(len - i)) = len + sum i=0...len-1(T(i))

As noted above, the boundary condition is T(0) = 1. This can be solved by standard recursive function methods. Let's look at the first few terms:

len   T(len)
0     1
1     1 + 1 = 2
2     2 + 2 + 1 = 5
3     3 + (4 + 2 + 1) = 11
4     4 + (11 + 5 + 2 + 1) = 23
5     5 + (23 + 11 + 5 + 2 + 1) = 47

The pattern is clearly T(len) = 2 * T(len - 1) + 1. This corresponds to exponential complexity:

T(n) = O(2n)

Of course, this result depends on the assumption we made about pr(i). (For instance, if pr(i) = 0 for all i, then T(n) = O(1). There would also be non-exponential growth if pr(i) had a maximum prefix length—pr(i) = 0 for all i > M for some M.) The assumption that pr(i) is independent of i is probably unrealistic, but this really depends on how s is populated.

Sign up to request clarification or add additional context in comments.

Comments

0

Assuming that you've fixed the bugs others have noted, then the i values are the places that the string is being split (each i is the leftmost splitpoint, and then you recurse on everything to the right of i). This means that if you were to unwind the recursion, you are looking at up to n-1 different split points, and asking if each substring is a valid word. Things are ok if the beginning of word doesn't have a lot of elements from your set, since then you can skip the recursion. But in the worst case, prefix in s is always true, and you try every possible subset of the n-1 split points. This gives 2^{n-1} different splitting sets, multiplied by the length of each such set.

2 Comments

could you explain why it is 2^{n-1} different splitting in the worst case? THX!
As things stand, the loop executes with i=1 and then returns. This means that fun is O(n) in the worst case. Assuming that the loop should execute with all i and then return, it may be easier to think of a subset. Then this subset corresponds to i=subset[0], with the rest of the subset depending on the recursion on suffix.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.