4

How to calculate the standard divination of an array in Java? As you can see I have already calculated the mean, and I know that at the end I will have to divide by the sample size minus 1 (n-1) and square that number. The problem I'm having is how to take every number and calculate how far it is away from the mean, then square that number. I know I could do every number from the data set separately but there has to be an easier way. Any help would be appreciated, here's my code.

public class CalculateArray
{

    public static void main(String[] args)
    {
        int [] numbers = new int[]{1,2,3,4,5,6,7,8,9};

        int sum = 0;
        int max = 0;
        int min = numbers[0];
        double sd = 0;
    
        for(int i=0; i<numbers.length; i++)
        {
            sum = sum + numbers[i];
        }

        double average = sum / numbers.length;

        System.out.println("Average value is : " + average);

        for(int i=0; i<numbers.length; i++)
        {
            if(numbers[i] > max)
            {
                max = numbers[i];
            }
        }

        System.out.println("max number is : " + max);

        for(int i=0; i<numbers.length; i++)
        {
            if(numbers[i] < min)
            {
                min = numbers[i];
            }
        }

        System.out.println("min number is : " + min);

        for (int i=0; i<numbers.length;i++)
        {
           //this is where im having problems
           sd = ???
        }

        double standardDeviation = math.sqrt(sd/(numbers.length-1));

        System.out.println("The standard deviation is : " + standardDeviation);
    }
}
1
  • What about std += pow(average - numbers[i], 2);? Commented Feb 12, 2013 at 18:20

3 Answers 3

8

To calculate how far a number is from the mean you use the - operator. For calculating the square you can use Math.pow. So, given that you have already calculated the average earlier in the program:

for (int i=0; i<numbers.length;i++)
{
    sd = sd + Math.pow(numbers[i] - average, 2);
}

By the way, the way you calculate the mean currently is broken. You should define sum as double, not as int.

Sign up to request clarification or add additional context in comments.

8 Comments

You should use Math.abs(numbers[i] - average), otherwise it generate negetive number when number[i]<avarage.
@Quoi, you don't need to calculate the absolute value. a² = |a|².
And at the end you'll want to divide the sum of the squared deviations by the number of datapoints. I think that is a root mean square; in the terms above, rms = sd / numbers.length
@Quoi Math.pow(ANYTHING,2); will always be positive so is redundant to use Math.abs
|
1

In addition to the two pass algorithm described by others (calculate mean in the first pass and std dev in the next) please see this link for an example of how it can be done in a single pass. The algorithm is as follows:

double std_dev2(double a[], int n) {
    if(n == 0)
        return 0.0;
    double sum = 0;
    double sq_sum = 0;
    for(int i = 0; i < n; ++i) {
       sum += a[i];
       sq_sum += a[i] * a[i];
    }
    double mean = sum / n;
    double variance = sq_sum / n - mean * mean;
    return sqrt(variance);
}

UPDATE
Don't do this. As Joni explains in his comment below there is a high risk of error when implementing this is a computer program. For a stable online algorithm, Joni directs us to this Wikipedia article, which as mentioned has been thoroughly analyzed.

6 Comments

A horrible algorithm, should never be used. For why it's bad and what alternatives exist see johndcook.com/blog/2008/09/26/…
@Joni thanks for the input. Is this basically an overflow error?
The problem is mainly with sq_sum / n - mean * mean: it calculates the difference of two large nearly equal numbers. The cancellation of digits means that the relative error goes up the roof - you may even end up with a negative variance. Wikipedia has a pretty decent page about this type of numerical problem: en.wikipedia.org/wiki/Loss_of_significance
Thanks @Joni, I need to learn to not believe everything I read on internet. I am updating my answer with your comment
Well, to the credit of your source, they do warn you: "Unfortunately, the result will be inaccurate when the array contains large numbers". There is a better behaved 1-pass algorithm though: en.wikipedia.org/wiki/…
|
0
k + k = 2k

to get the average, divide 2k by the number of terms you have. So the average of the terms is k. :D

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.