When should one implement a simple or advanced sorting algorithm?

Question

Apart from the obvious "It's faster when there are many elements". When is it more appropriate to use a simple sorting algorithm (0(N^2)) compared to an advanced one (O(N log N))?

I've read quite a bit about for example insertion sort being preferred when you've got a small array that's nearly sorted because you get the best case N. Why is it not good to use quicksort for example, when you've got say 20 elements. Not just insertion or quick but rather when and why is a more simple algorithm useful compared to an advanced?

EDIT: If we're working with for example an array, does it matter which data input we have? Such as objects or primitive types (Integer).

With 20 elements you are not going to notice a difference in efficiency so you can use whatever algorithm you wish. Insertion sort it better for almost sorted lists because it will finish in N operations where as quick sort will always go nlogn — Mitch
– Mitch, Commented Feb 19, 2018 at 19:34
If it's something small like your example, a simple sort is just fine. But some server hosting contracts are based on processing hours and saving seconds off of larger queries/sorts can save $$$ in the long run. — Reed
– Reed, Commented Feb 19, 2018 at 19:34
Sometimes is simpler to implement a inefficient algorithm, like sorting O(n^2) and a little bit harder to implement a O(n log n). If you don't need that much of efficiency because you have small inputs, you can prefer code that is simpler to maintain. — aeliton
– aeliton, Commented Feb 19, 2018 at 19:37
@aeliton I disbelieve. Can you name me a decent programming environment where there is not an efficient sort routine already implemented for your convenience? — btilly
– btilly, Commented Feb 19, 2018 at 19:58

Cort Ammon · Accepted Answer · 2018-02-19 19:39:57Z

The big-oh notation captures the runtime cost of the algorithm for large values of N. It is less effective at measuring the runtime of the algorithm for small values.

The actual transition from one algorithm to another is not a trivial thing. For large N, the effects of N really dominate. For small numbers, more complex effects become very important. For example, some algorithms have better cache coherency. Others are best when you know something about the data (like your example of insertion sort when the data is nearly sorted).

The balance also changes over time. In the past, CPU speeds and memory speeds were closer together. Cache coherency issues were less of an issue. In modern times, CPU speeds have generally left memory busses behind, so cache coherency is more important.

So there's no one clear cut and dry answer to when you should use one algorithm over another. The only reliable answer is to profile your code and see.

For amusement: I was looking at the dynamic disjoint forest problem a few years back. I came across a state-of-the-art paper that permitted some operations to be done in something silly like O(log log N / log^4N). They did some truly brilliant math to get there, but there was a catch. The operations were so expensive that, for my graphs of 50-100 nodes, it was far slower than the O(n log n) solution that I eventually used. The paper's solution was far more important for people operating on graphs of 500,000+ nodes.

TMcSquared · Accepted Answer · 2018-02-19 19:37:14Z

1

When programming sorting algorithms, you have to take into account how much work would be put into implementing the actual algorithm vs the actual speed of it. For big O, the time to implement advanced algorithms would be outweighed by the decreased time taken to sort. For small O, such as 20-100 items, the difference is minimal, so taking a simpler route is much better.

answered Feb 19, 2018 at 19:37

TMcSquared

11715 bronze badges

1 Comment

btilly Over a year ago

The effort of calling a built-in sort routine is less work than implementing your own. You should always do the simplest efficient thing unless there is a very good specific reason not to.

dpp014 · Accepted Answer · 2018-02-19 19:44:05Z

1

First of all O-Notation gives you the sense of the worst case scenario. So in case the array is nearly sorted the execution time could be near to linear time so it would be better than quick sort for example. In case the n is small enough, we do take into consideration other aspects. Algorithms such as Quick-sort can be slower because of all the recursions called. At that point it depends on how the OS handles the recursions which can end up being slower than the simple arithmetic operations required in the insertion-sort. And not to mention the additional memory space required for recursive algorithms.

answered Feb 19, 2018 at 19:44

dpp014

112 bronze badges

Comments

btilly · Accepted Answer · 2018-02-19 19:54:35Z

Better than 99% of the time, you should not be implementing a sorting algorithm at all.

Instead use a standard sorting algorithm from your language's standard library. In one line of code you get to use a tested and optimized implementation which is O(n log(n)). It likely implements tricks you wouldn't have thought of.

For external sorts, I've used the Unix sort utility from time to time. Aside from the non-intuitive LC_ALL=C environment variable that I need to get it to behave, it is very useful.

Any other cases where you actually need to implement your own sorting algorithm, what you implement will be driven by your precise needs. I've had to deal with this exactly once for production code in two decades of programming. (That was because for a complex series of reasons, I needed to be sorting compressed data on a machine which literally did not have enough disk space to store said data uncompressed. I used a merge sort.)

Collectives™ on Stack Overflow

When should one implement a simple or advanced sorting algorithm?

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related