0

I would like to generate data to test sorting algorithms with. This accomplishes two things:

  1. Find bugs. The output could easily be checked if it was in fact sorted correctly
  2. Profile the code and find which situations take longer for which parts.

I asked the question How do you test speed of sorting algorithm? awhile ago, but this question focuses particularly on generating the data.

I am thinking of

  1. sorted
  2. reverse sorted
  3. random
  4. sorted but then make n inversions in randomly selected elements and see how changing n affects the run time

Any suggestions? Do any frameworks exist that would make this easier? I'm thinking JUnit could be useful.

In this question on comp sci se an answer makes it sound like adding inversions and counting them doesn't mean much:

The number of inversions might work for some cases, but is sometimes insufficient. An example given in [3] is the sequence

$$\langle \lfloor n/2 \rfloor + 1, \lfloor n/2 \rfloor + 2, \ldots, n, 1, \ldots, \lfloor n/2 \rfloor \rangle$$

that has a quadratic number of inversions, but only consists of two ascending runs. It is nearly sorted, but this is not captured by inversions.

I'm not particularly strong in math and don't understand how the example illustrates what's wrong with counting the number of inversions? Is it just academic? How does it make sense to say "quadratic number of inversions"?

3
  • 1
    My unscientific observations: Real data that needs to be sorted is usually either (listed here in no particular order): a) effectively random (because the order that the data was generated from is based on a different key), b) already sorted, c) already sorted in reverse order, or d) mostly sorted with a (relatively) few items out of place. And invariably, real data that needs to be sorted has lots of duplicate keys - N can outnumber K by many orders of magnitude. Commented Oct 22, 2016 at 1:16
  • @500-InternalServerError you seem to have experience, may I ask where from? Is there any online sources of free sample data? Commented Oct 22, 2016 at 5:56
  • 1
    Also try reverse sorted with a few inversions. To model adding data to an existing sorted file try, say, 90% sorted with 10% random records at the end. Commented Nov 2, 2016 at 12:16

1 Answer 1

0

Using integer math, the $$...$$ sequence can represent an array:

    1      2      n/2              n    indices
n/2+1, n/2+2, ...,  n, 1, 2, ... n/2    array values

So as stated, just two ascending sequences.

By the definition of inversion, two elements a[i] and a[j] form an inversion if a[i] > a[j] and i < j. This means that all of the first n/2 elements of a, a[1 to n/2] are greater than all of the second n/2 elements of a, a[(n/2)+1 to n]. So that's (n/2)^2 = n^2/4 inversions which is quadratic.

The relationship between inversion count and sort time complexity depends on the sort algorithm. Using bubble sort on the example array would have time complexity O(n^2). Using generic merge sort on the array would be O(n log(n)), with an near best case comparison count. Using natural merge sort would find the two sorted runs and do a single merge pass for time complexity of O(n).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.