String sorting using Merge Sort

Question

What will be the worst complexity for sorting n strings having n characters each? Will it be just n times its avg. case O(n log n) or something else...?

It's not clear what you're asking.

Oliver Charlesworth
– Oliver Charlesworth

2012-02-26 00:32:05 +00:00
Commented Feb 26, 2012 at 0:32 — Oliver Charlesworth
– Oliver Charlesworth, Commented Feb 26, 2012 at 0:32

Donald Miner · Accepted Answer · 2012-02-26 01:30:40Z

8

When you are talking about O notation with two things with different lengths, typically you want to use different variables, like M and N.

So, if your merge sort is O(N log N), where N is the number of strings... and comparing two strings is O(M) where M scales with the length of the string, then you'll be left with:

O(N log N) * O(M)

or

O(M N log N)

where M is the string length and N is the number of strings. You want to use different labels because they don't mean the same thing.

In the strange case where the average string length scales with the number of strings, like if you had a matrix stored in strings or something like that, you could argue that M = N, and then you'd have O(N^2 log N)

edited Feb 26, 2012 at 1:30

answered Feb 26, 2012 at 0:34

Donald Miner

40.1k10 gold badges99 silver badges118 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

xan Over a year ago

Don't you mean "O(M) where M..." instead of "O(N) where N..."? And while that is worst case performance, as requested, it should be noted that average case performance for comparing two strings is O(1) since it becomes geometrically less and less likely you'll need to visit each additional character in the strings.

Donald Miner Over a year ago

Sure, I meant them separate but I changed it to use M to be more clear. He's asking for "worst complexity", but giving an "average" sting size... so it's still O(N), right?

xan Over a year ago

Yes, the question is a bit unclear with its mixing of worst and average. I think your answer would be stronger to cover both.

Donald Miner Over a year ago

Yea-- usually the effort I put into the answer is proportional to the effort put into the question :)

Abhishek Over a year ago

but now i think it would be O(n^2) only

|

amit · Accepted Answer · 2012-02-26 06:55:58Z

3

As @orangeoctopus, using standard ranking algorithm on a collection of n strings of size n will result in O(n^2 * logn) computation.

However - note that you can do it in O(n^2), with variations on radix sort.

The simplest way to do it [in my opinion] - is

build a trie, and populate it with all your strings. Entering each string is O(n) and you do it n times - total of O(n^2)
do a DFS on the trie, each time you encounter the mark for end for string - add it to the sorted collection. The order of the strings added this way is lexicographically, so your list will be sorted lexicographically when you are done.

It is easy to see you cannot do it any better then O(n^2), since only reading the data is O(n^2), thus this solution is optimal in terms of big O notation of time complexity.

answered Feb 26, 2012 at 6:55

amit

179k27 gold badges245 silver badges348 bronze badges

3 Comments

CEGRD Over a year ago

I think instead of saying "DFS", saying "pre-order traversal" would be more clear.

Kshitij Over a year ago

Can O(n^2) achieved without using trie also ?

amit Over a year ago

@Kshitij Yes, do a radix sort on the string, the trie is just a suggestion - a standard radix sort will work here - using characters (or their bit representation) each iteration to achieve the current partial order, until you exhaust all bits/characters. This will take O(n^2) as well.

Ali Ferhat · Accepted Answer · 2012-02-26 09:56:43Z

0

Sorting n items with MergeSort requires O(N LogN) comparisons. If the time to compare two items is O(1) then the total running time will be O(N logN). However, comparing two strings of length N requires O(N) time, so a naive implementation might stuck with O(N*N logN) time.

This seems wasteful because we are not taking advantage of the fact that there are only N strings around to make comparisons. We might somehow preprocess the strings so that comparisons take less time on average.

Here is an idea. Create a Trie structure and put N strings there. The trie will have O(N*N) nodes and require O(N*N) time to build. Traverse the tree and put an integer "ranking" to each node at the tree; If R(N1)<R(N2) then the string associated with Node1 comes before the string associated with Node2 in a dictionary.

Now proceed with Mergesort, do the comparisons in O(1) time by looking up the Trie. The total running time will be O(N*N + N*logN) = O(N*N)

Edit: My answer is very similar to that of @amit. However I proceed with mergesort where he proceeds with radixsort after the trie building step.

edited Feb 26, 2012 at 9:56

answered Feb 26, 2012 at 9:11

Ali Ferhat

2,57918 silver badges24 bronze badges

2 Comments

CEGRD Over a year ago

Do you also keep an index mapping words to the trie nodes so that you can access those rankings during the merge sort? clarification please. Also, I think you should also include the cost of traversing. So the complexity should be O(NN + NN + NlogN). If this is true, then the radix sort approach seems better since it is O(NN + N*N).

Ali Ferhat Over a year ago

@CERGD: Big O notation deals only on the asymptotic growth with respect to input size; it does not deal with constant factors, O(2*NN + NlogN) = O(NN). Revisiting the question after a few months, it is clear that amit's answer is simpler and faster. Still, I disagree with your argument: the only way to measure the actual performance is to use a chronometer, not to look at the constant factors in the O-notation. There are even cases where an algorithm with a larger O() function beats the other in practical situations.

Collectives™ on Stack Overflow

String sorting using Merge Sort

3 Answers 3

7 Comments

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related