Profiling sorting algorithms against partially sorted data

Question

We know that several sorts, such as insertion sort, are great on arrays that are 'mostly-sorted' and not so great on random data.

Suppose we wanted to profile the performance improvement/degradation of such an algorithm relative to how 'sorted' the input data is. What would be a good way to generate an 'increasingly sorted' or 'increasingly random' array of elements? How might we measure the 'sortedness' of the input?

Matthieu M. · Accepted Answer · 2011-02-25 09:18:44Z

10

Number of Inversion is a usual measure of how much sorted an array is.

A pair of elements (pi,pj) in permutation p is called an inversion in a permutation if i<j and pi >pj. For example, in the permutation (3,1,2,5,4) contains the 3 inversions (3,1), (3,2) and (5,4).

A sorted array got 0 inversion and reverse sorted array got n*(n-1)/2.

edited Feb 25, 2011 at 9:18

Matthieu M.

303k58 gold badges496 silver badges754 bronze badges

answered Feb 25, 2011 at 3:19

Zimbabao

8,2803 gold badges32 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Lee Taylor · Accepted Answer · 2013-04-22 13:55:31Z

2

You could generate a "partially sorted" dataset by interrupting a modern Fisher-Yates shuffle run on an already ordered dataset.

Also, if you only need a few essentially fixed sets of partially sorted data, then you could generate a column graph of position vs value for each and just eye-ball them. That would let you quickly see the general random-ness of a set, as well things like how much localised order there is.

edited Apr 22, 2013 at 13:55

Lee Taylor

7,99416 gold badges38 silver badges53 bronze badges

answered Feb 25, 2011 at 3:26

staticsan

30.6k5 gold badges64 silver badges72 bronze badges

2 Comments

int3 Over a year ago

It seems that interrupting the Fisher-Yates shuffle would produce an array made up of one chunk of sorted data and one chunk of random data. What if we wanted to generate a sequence with numerous short runs of sorted and random data?

staticsan Over a year ago

Hmm. Hadn't thought of that. Different sorting algorithms have different sorting patterns; several create short runs of sorted data partway through their execution. Sedgewick's Algorithms book (for example) would help you pick one that does that. The trick would be figuring out when to stop it...

Jim Mischel · Accepted Answer · 2013-04-22 14:23:44Z

0

Also look into creating a binary heap, and then using the array representation as your starting point. A binary heap implemented in an array is not sorted, but it is ordered. I think it would be considered "partially sorted."

answered Apr 22, 2013 at 14:23

Jim Mischel

135k25 gold badges197 silver badges377 bronze badges

Collectives™ on Stack Overflow

Profiling sorting algorithms against partially sorted data

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related