Return to Answer

Add detail on Random

Source Link

edited Oct 21, 2014 at 5:00

rolfl

98.1k
17
220
419

More Detail on Random()

I made a reference to Random, and it is worth understanding more about what I mean. This is the (slightly reorganized) source code for java.util.Random:

private static final AtomicLong seedUniquifier
    = new AtomicLong(8682522807148012L);

public Random() {
    this(seedUniquifier() ^ System.nanoTime());
}

private static long seedUniquifier() {
    // L'Ecuyer, "Tables of Linear Congruential Generators of
    // Different Sizes and Good Lattice Structure", 1999
    for (;;) {
        long current = seedUniquifier.get();
        long next = current * 181783497276652981L;
        if (seedUniquifier.compareAndSet(current, next))
            return next;
    }
}

public Random(long seed) {
    if (getClass() == Random.class)
        this.seed = new AtomicLong(initialScramble(seed));
    else {
        // subclass might have overriden setSeed
        this.seed = new AtomicLong();
        setSeed(seed);
    }
}

Notice that there is a static AtomicLong called seedUniquifier. Every time you create a new Random, two references are made to that AtomicLong, causing a number of memory effects that are unnecessary. Additionally, there is a potential race condition in there which may cause the process to loop and retry.

It is already somewhat bad that Random has a single AtomicLong reference to ensure that the Random class is thread safe, but the entire Random constructor is globally safe too essentially (you effectively cannot create two Random instances at the same time, with the same seed). The implementation of that requirement is... costly.

More Detail on Random()

I made a reference to Random, and it is worth understanding more about what I mean. This is the (slightly reorganized) source code for java.util.Random:

private static final AtomicLong seedUniquifier
    = new AtomicLong(8682522807148012L);

public Random() {
    this(seedUniquifier() ^ System.nanoTime());
}

private static long seedUniquifier() {
    // L'Ecuyer, "Tables of Linear Congruential Generators of
    // Different Sizes and Good Lattice Structure", 1999
    for (;;) {
        long current = seedUniquifier.get();
        long next = current * 181783497276652981L;
        if (seedUniquifier.compareAndSet(current, next))
            return next;
    }
}

public Random(long seed) {
    if (getClass() == Random.class)
        this.seed = new AtomicLong(initialScramble(seed));
    else {
        // subclass might have overriden setSeed
        this.seed = new AtomicLong();
        setSeed(seed);
    }
}

Source Link

answered Oct 21, 2014 at 4:41

rolfl

98.1k
17
220
419

@mjolka was able to identify the root problem before I was ;-) There are two issues in addition to what he has pointed out.

The first is that it is very bad form to create new Random instances on each partitioning. Random instances internally synchronize on some logic structures, and cause a lot of overhead in their creation. You should instead have a different mechanism for the partitioning.

Additionally, your naming is horrible:

your class name starts with a lower-case
your variables are 1-letter long and some are upper-case (A, p, q, r, ...).

So, I did a few tests with your code, except I changed the data generator to:

public static int[] data(int n) {
    Random random = new Random(seed); // Random seed stays same for all sorts
    int[] list = new int[n];

    for(int i = 0; i < list.length; i++) {
        list[i] = random.nextInt(list.length * 10);
    }

    return list;
}

When I ran yur code with this, I got the results:

n        Insertion        Merge            Heap             Quick            
10       0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:0:34   
1000     0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:2:480  
100000   0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:21:987 
1000000  0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:139:194
10000000 0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:1:468:183

As a result, you don't run in to the duplicate data issue quite as much, and the sort is fast.

Then I removed the new Random() from the random partition, and just used the mid-value, and he results are:

n        Insertion        Merge            Heap             Quick        
10       0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:0:17   
1000     0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:0:708  
100000   0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:18:462 
1000000  0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:103:790
10000000 0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:1:182:385

Finally, I implemented my own sort using the equals-value grouping, and turned the data back to limit it to 1000, and the results are:

n        Insertion        Merge            Heap             Quick            Monkey           
10       0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:0:16       0:0:0:0:9        
1000     0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:0:562      0:0:0:0:684      
100000   0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:16:675     0:0:0:18:119     
1000000  0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:532:437    0:0:0:65:608     
10000000 0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:48:533:774   0:0:0:694:184

With more random data the monkey sort slows down, and the quick-sort speeds up:

n        Insertion        Merge            Heap             Quick            Monkey           
10       0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:0:16       0:0:0:0:8        
1000     0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:0:631      0:0:0:0:819      
100000   0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:16:834     0:0:0:22:353     
1000000  0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:0:103:699    0:0:0:127:421    
10000000 0:0:0:0:0        0:0:0:0:0        0:0:0:0:0        0:0:1:223:673    0:0:1:372:614

Bottom line is that even with good data, 20% of your time is going to creating new Random instances. I have been using the randomizePartition method:

public static int randomizedPartition(int[] data, int first, int last) {
    // Random random = new Random();
    // int i = random.nextInt((last-first)+1)+first;
    swap(data,last,first + (last - first)/2);
    return partition(data,first,last);

Then, if you are interested in the monkey sort:

public static long monkey(int[] list) {
    long startTime = System.nanoTime();
    monkeySort(list, 0, list.length - 1);
    return System.nanoTime() - startTime;
}

private static void monkeySort(final int[] data, final int left, final int right) {

    if (left >= right) {
        return;
    }

    // partition in this method because we need two outputs, the 'same' and the 'lft'.
    // swap values the same as the partition to the end as well.
    final int val = data[right];
    int lft = left;
    int rht = right - 1;
    int same = right;
    while (lft <= rht) {
        if (data[lft] > val) {
            swap(data, lft, rht);
            rht--;
        } else if (data[rht] == val) {
            same--;
            swap(data, rht, same);
            rht--;
        } else {
            lft++;
        }
    }

    // move all the 'same' values in to the pivot point.
    int ntop = lft - 1;
    while (same <= right) {
        swap(data, lft++, same++);
    }
    monkeySort(data, left, ntop);
    monkeySort(data, lft, right);
    
}