2

Apart from the obvious "It's faster when there are many elements". When is it more appropriate to use a simple sorting algorithm (0(N^2)) compared to an advanced one (O(N log N))?

I've read quite a bit about for example insertion sort being preferred when you've got a small array that's nearly sorted because you get the best case N. Why is it not good to use quicksort for example, when you've got say 20 elements. Not just insertion or quick but rather when and why is a more simple algorithm useful compared to an advanced?

EDIT: If we're working with for example an array, does it matter which data input we have? Such as objects or primitive types (Integer).

9
  • 2
    With 20 elements you are not going to notice a difference in efficiency so you can use whatever algorithm you wish. Insertion sort it better for almost sorted lists because it will finish in N operations where as quick sort will always go nlogn Commented Feb 19, 2018 at 19:34
  • If it's something small like your example, a simple sort is just fine. But some server hosting contracts are based on processing hours and saving seconds off of larger queries/sorts can save $$$ in the long run. Commented Feb 19, 2018 at 19:34
  • 1
    Sometimes is simpler to implement a inefficient algorithm, like sorting O(n^2) and a little bit harder to implement a O(n log n). If you don't need that much of efficiency because you have small inputs, you can prefer code that is simpler to maintain. Commented Feb 19, 2018 at 19:37
  • stackoverflow.com/questions/736920/… Commented Feb 19, 2018 at 19:38
  • @aeliton I disbelieve. Can you name me a decent programming environment where there is not an efficient sort routine already implemented for your convenience? Commented Feb 19, 2018 at 19:58

4 Answers 4

4

The big-oh notation captures the runtime cost of the algorithm for large values of N. It is less effective at measuring the runtime of the algorithm for small values.

The actual transition from one algorithm to another is not a trivial thing. For large N, the effects of N really dominate. For small numbers, more complex effects become very important. For example, some algorithms have better cache coherency. Others are best when you know something about the data (like your example of insertion sort when the data is nearly sorted).

The balance also changes over time. In the past, CPU speeds and memory speeds were closer together. Cache coherency issues were less of an issue. In modern times, CPU speeds have generally left memory busses behind, so cache coherency is more important.

So there's no one clear cut and dry answer to when you should use one algorithm over another. The only reliable answer is to profile your code and see.

For amusement: I was looking at the dynamic disjoint forest problem a few years back. I came across a state-of-the-art paper that permitted some operations to be done in something silly like O(log log N / log^4N). They did some truly brilliant math to get there, but there was a catch. The operations were so expensive that, for my graphs of 50-100 nodes, it was far slower than the O(n log n) solution that I eventually used. The paper's solution was far more important for people operating on graphs of 500,000+ nodes.

Sign up to request clarification or add additional context in comments.

Comments

1

When programming sorting algorithms, you have to take into account how much work would be put into implementing the actual algorithm vs the actual speed of it. For big O, the time to implement advanced algorithms would be outweighed by the decreased time taken to sort. For small O, such as 20-100 items, the difference is minimal, so taking a simpler route is much better.

1 Comment

The effort of calling a built-in sort routine is less work than implementing your own. You should always do the simplest efficient thing unless there is a very good specific reason not to.
1

First of all O-Notation gives you the sense of the worst case scenario. So in case the array is nearly sorted the execution time could be near to linear time so it would be better than quick sort for example. In case the n is small enough, we do take into consideration other aspects. Algorithms such as Quick-sort can be slower because of all the recursions called. At that point it depends on how the OS handles the recursions which can end up being slower than the simple arithmetic operations required in the insertion-sort. And not to mention the additional memory space required for recursive algorithms.

Comments

1

Better than 99% of the time, you should not be implementing a sorting algorithm at all.

Instead use a standard sorting algorithm from your language's standard library. In one line of code you get to use a tested and optimized implementation which is O(n log(n)). It likely implements tricks you wouldn't have thought of.

For external sorts, I've used the Unix sort utility from time to time. Aside from the non-intuitive LC_ALL=C environment variable that I need to get it to behave, it is very useful.

Any other cases where you actually need to implement your own sorting algorithm, what you implement will be driven by your precise needs. I've had to deal with this exactly once for production code in two decades of programming. (That was because for a complex series of reasons, I needed to be sorting compressed data on a machine which literally did not have enough disk space to store said data uncompressed. I used a merge sort.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.