6

What will be the worst complexity for sorting n strings having n characters each? Will it be just n times its avg. case O(n log n) or something else...?

1
  • It's not clear what you're asking. Commented Feb 26, 2012 at 0:32

3 Answers 3

8

When you are talking about O notation with two things with different lengths, typically you want to use different variables, like M and N.

So, if your merge sort is O(N log N), where N is the number of strings... and comparing two strings is O(M) where M scales with the length of the string, then you'll be left with:

O(N log N) * O(M)

or

O(M N log N)

where M is the string length and N is the number of strings. You want to use different labels because they don't mean the same thing.

In the strange case where the average string length scales with the number of strings, like if you had a matrix stored in strings or something like that, you could argue that M = N, and then you'd have O(N^2 log N)

Sign up to request clarification or add additional context in comments.

7 Comments

Don't you mean "O(M) where M..." instead of "O(N) where N..."? And while that is worst case performance, as requested, it should be noted that average case performance for comparing two strings is O(1) since it becomes geometrically less and less likely you'll need to visit each additional character in the strings.
Sure, I meant them separate but I changed it to use M to be more clear. He's asking for "worst complexity", but giving an "average" sting size... so it's still O(N), right?
Yes, the question is a bit unclear with its mixing of worst and average. I think your answer would be stronger to cover both.
Yea-- usually the effort I put into the answer is proportional to the effort put into the question :)
but now i think it would be O(n^2) only
|
3

As @orangeoctopus, using standard ranking algorithm on a collection of n strings of size n will result in O(n^2 * logn) computation.

However - note that you can do it in O(n^2), with variations on radix sort.

The simplest way to do it [in my opinion] - is

  1. build a trie, and populate it with all your strings. Entering each string is O(n) and you do it n times - total of O(n^2)
  2. do a DFS on the trie, each time you encounter the mark for end for string - add it to the sorted collection. The order of the strings added this way is lexicographically, so your list will be sorted lexicographically when you are done.

It is easy to see you cannot do it any better then O(n^2), since only reading the data is O(n^2), thus this solution is optimal in terms of big O notation of time complexity.

3 Comments

I think instead of saying "DFS", saying "pre-order traversal" would be more clear.
Can O(n^2) achieved without using trie also ?
@Kshitij Yes, do a radix sort on the string, the trie is just a suggestion - a standard radix sort will work here - using characters (or their bit representation) each iteration to achieve the current partial order, until you exhaust all bits/characters. This will take O(n^2) as well.
0

Sorting n items with MergeSort requires O(N LogN) comparisons. If the time to compare two items is O(1) then the total running time will be O(N logN). However, comparing two strings of length N requires O(N) time, so a naive implementation might stuck with O(N*N logN) time.

This seems wasteful because we are not taking advantage of the fact that there are only N strings around to make comparisons. We might somehow preprocess the strings so that comparisons take less time on average.

Here is an idea. Create a Trie structure and put N strings there. The trie will have O(N*N) nodes and require O(N*N) time to build. Traverse the tree and put an integer "ranking" to each node at the tree; If R(N1)<R(N2) then the string associated with Node1 comes before the string associated with Node2 in a dictionary.

Now proceed with Mergesort, do the comparisons in O(1) time by looking up the Trie. The total running time will be O(N*N + N*logN) = O(N*N)

Edit: My answer is very similar to that of @amit. However I proceed with mergesort where he proceeds with radixsort after the trie building step.

2 Comments

Do you also keep an index mapping words to the trie nodes so that you can access those rankings during the merge sort? clarification please. Also, I think you should also include the cost of traversing. So the complexity should be O(NN + NN + NlogN). If this is true, then the radix sort approach seems better since it is O(NN + N*N).
@CERGD: Big O notation deals only on the asymptotic growth with respect to input size; it does not deal with constant factors, O(2*NN + NlogN) = O(NN). Revisiting the question after a few months, it is clear that amit's answer is simpler and faster. Still, I disagree with your argument: the only way to measure the actual performance is to use a chronometer, not to look at the constant factors in the O-notation. There are even cases where an algorithm with a larger O() function beats the other in practical situations.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.