Swift Array extension for standard deviation

Question

I am frequently needing to calculate mean and standard deviation for numeric arrays. So I've written a small protocol and extensions for numeric types that seems to work. I just would like feedback if there is anything wrong with how I have done this. Specifically, I am wondering if there is a better way to check if the type can be cast as a Double to avoid the need for the asDouble variable and init(_:Double) constructor.

I know there are issues with protocols that allow for arithmetic, but this seems to work ok and saves me from putting the standard deviation function into classes that need it.

protocol Numeric {
    var asDouble: Double { get }
    init(_: Double)
}

extension Int: Numeric {var asDouble: Double { get {return Double(self)}}}
extension Float: Numeric {var asDouble: Double { get {return Double(self)}}}
extension Double: Numeric {var asDouble: Double { get {return Double(self)}}}
extension CGFloat: Numeric {var asDouble: Double { get {return Double(self)}}}

extension Array where Element: Numeric {

    var mean : Element { get { return Element(self.reduce(0, combine: {$0.asDouble + $1.asDouble}) / Double(self.count))}}

    var sd : Element { get {
        let mu = self.reduce(0, combine: {$0.asDouble + $1.asDouble}) / Double(self.count)
        let variances = self.map{pow(($0.asDouble - mu), 2)}
        return Element(sqrt(variances.mean))
    }}
}

edit: I know it's kind of pointless to get [Int].mean and sd, but I might use numeric elsewhere so it's for consistency..

edit: as @Severin Pappadeux pointed out, variance can be expressed in a manner that avoids the triple pass on the array - mean then map then mean. Here is the final standard deviation extension

extension Array where Element: Numeric {

    var sd : Element { get {
        let sss = self.reduce((0.0, 0.0)){ return ($0.0 + $1.asDouble, $0.1 + ($1.asDouble * $1.asDouble))}
        let n = Double(self.count)
        return Element(sqrt(sss.1/n - (sss.0/n * sss.0/n)))
    }}
}

Int is generally the same size as Int64 on newer devices (>= iPhone 5S, which introduced the 64bit processor), so unless you're working with really large numbers, this shouldn't be an issue: but just know that init(_: Double) can lead to an integer overflow (runtime exception) in cases where the Element = Int type cannot store the integer representation of a given (huge) Double value. Possibly not an issue if you just use your Swift apps yourself, but in case you ship to customers, this might be good to bear in mind. — dfrib
– dfrib, Commented Jul 17, 2016 at 14:40
Ok interesting thanks. It's unlikely I will use it with integers, and the values I'm working with are physiologically constrained to < 500 with this app. So should be ok. — twiz_
– twiz_, Commented Jul 17, 2016 at 14:55
@dfri very useful comment! I presume that there is no way to "catch" this kind of overflow? — matt
– matt, Commented Jul 17, 2016 at 16:37
@matt Thanks! I guess we could include a static min and max property in Numeric and check the double representation (under the assumption that all numeric values can can be seen as "kind of" as subset of the range of valid Double values; i.e., always convertible to Double without any risk of overflow on that part, but I guess in worst case we get Double.infinity) of this property vs the Double valued sum from the reduce operation above. E.g. something along these lines. — dfrib
– dfrib, Commented Jul 17, 2016 at 16:58
@dfri I may be wrong, but from reading over the new FloatingPoint protocol in Swift 3, I think it might save you some work in that gist. — It's funny, though, how you can "catch" overflow when adding two Ints (by using a special operator) but you can't "catch" it when coercing to a Double. — matt
– matt, Commented Jul 17, 2016 at 17:14

doovers · Accepted Answer · 2017-12-07 00:05:18Z

24

Swift 4 Array extension with FloatingPoint elements:

extension Array where Element: FloatingPoint {

    func sum() -> Element {
        return self.reduce(0, +)
    }

    func avg() -> Element {
        return self.sum() / Element(self.count)
    }

    func std() -> Element {
        let mean = self.avg()
        let v = self.reduce(0, { $0 + ($1-mean)*($1-mean) })
        return sqrt(v / (Element(self.count) - 1))
    }

}

edited Dec 7, 2017 at 0:05

doovers

8,67510 gold badges46 silver badges73 bronze badges

answered Nov 9, 2017 at 20:16

David Thorsrud

2712 silver badges3 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Xander Dunn Over a year ago

Note that this is making use of Bessel's Correction: en.wikipedia.org/wiki/Bessel%27s_correction. See also stackoverflow.com/questions/27600207/…

Jordan Smith · Accepted Answer · 2016-07-21 01:20:31Z

10

There's actually a class that provides this functionality already - called NSExpression. You could reduce your code size and complexity by using this instead. There's quite a bit of stuff to this class, but a simple implementation of what you want is as follows.

let expression = NSExpression(forFunction: "stddev:", arguments: [NSExpression(forConstantValue: [1,2,3,4,5])])
let standardDeviation = expression.expressionValueWithObject(nil, context: nil)

You can calculate mean too, and much more. Info here: http://nshipster.com/nsexpression/

answered Jul 21, 2016 at 1:20

Jordan Smith

10.4k7 gold badges71 silver badges116 bronze badges

1 Comment

Feldur Over a year ago

Be careful if you intend to port to Linux - NSExpression is not implemented there.

matt · Accepted Answer · 2016-07-17 14:28:43Z

5

In Swift 3 you might (or might not) be able to save yourself some duplication with the FloatingPoint protocol, but otherwise what you're doing is exactly right.

answered Jul 17, 2016 at 14:28

matt

540k97 gold badges945 silver badges1.3k bronze badges

Comments

Sam Spencer · Accepted Answer · 2022-04-20 23:06:18Z

To follow up on Matt's observation, I'd do the main algorithm on FloatingPoint, taking care of Double, Float, CGFloat, etc. But then I then do another permutation of this on BinaryInteger, to take care of all of the integer types.

E.g. on FloatingPoint:

extension Array where Element: FloatingPoint {
    
    /// The mean average of the items in the collection.
    
    var mean: Element { return reduce(Element(0), +) / Element(count) }
    
    /// The unbiased sample standard deviation. Is `nil` if there are insufficient number of items in the collection.
    
    var stdev: Element? {
        guard count > 1 else { return nil }
        
        return sqrt(sumSquaredDeviations() / Element(count - 1))
    }
    
    /// The population standard deviation. Is `nil` if there are insufficient number of items in the collection.
    
    var stdevp: Element? {
        guard count > 0 else { return nil }
        
        return sqrt(sumSquaredDeviations() / Element(count))
    }
    
    /// Calculate the sum of the squares of the differences of the values from the mean
    ///
    /// A calculation common for both sample and population standard deviations.
    ///
    /// - calculate mean
    /// - calculate deviation of each value from that mean
    /// - square that
    /// - sum all of those squares
    
    private func sumSquaredDeviations() -> Element {
        let average = mean
        return map {
            let difference = $0 - average
            return difference * difference
        }.reduce(Element(0), +)
    }
}

But then on BinaryInteger:

extension Array where Element: BinaryInteger {
    var mean: Double { return map { Double(exactly: $0)! }.mean }
    var stdev: Double? { return map { Double(exactly: $0)! }.stdev }
    var stdevp: Double? { return map { Double(exactly: $0)! }.stdevp }
}

Note, in my scenario, even when dealing with integer input data, I generally want floating point mean and standard deviations, so I arbitrarily chose Double. And you might want to do safer unwrapping of Double(exactly:). You can handle this scenario any way you want. But it illustrates the idea.

Severin Pappadeux · Accepted Answer · 2016-07-21 00:38:00Z

2

Not that I know Swift, but from numerics POV you're doing it a bit inefficiently

Basically, you're doing two passes (actually, three) over the array to compute two values, where one pass should be enough. Vairance might be expressed as E(X²) - E(X)², so in some pseudo-code:

tuple<float,float> get_mean_sd(data) {
    float s  = 0.0f;
    float s2 = 0.0f;
    for(float v: data) {
        s  += v;
        s2 += v*v;
    }
    s  /= count;
    s2 /= count;

    s2 -= s*s;
    return tuple(s, sqrt(s2 > 0.0 ? s2 : 0.0));
}

answered Jul 21, 2016 at 0:38

Severin Pappadeux

20.4k4 gold badges45 silver badges71 bronze badges

3 Comments

twiz_ Over a year ago

You're right. Thank you, this does avoid the triple pass.

Severin Pappadeux Over a year ago

@twiz_ you're welcome, though I'm curious if it could be expressed via reduce()

twiz_ Over a year ago

Got it: let s = self.reduce((0.0, 0.0)){ return ($0.0 + $1.asDouble, $0.1 + ($1.asDouble * $1.asDouble))} then s.1/n - s.0/n * s.0/n. Sorry for the horrible formatting. New to this.

Sam Spencer · Accepted Answer · 2022-04-20 23:07:42Z

Just a heads-up, but when I tested the code outlined by Severin Pappadeux the result was a "population standard deviation" rather than a "sample standard deviation". You would use the first in an instance where 100% of the relevant data is available to you, such as when you are computing the variance around an average grade for all 20 students in a class. You would use the second if you did not have universal access to all the relevant data, and had to estimate the variance from a much smaller sample, such as estimating the height of all males within a large country.

The population standard deviation is often denoted as StDevP. The Swift 5.0 code I used is shown below. Note that this is not suitable for very large arrays due to loss of the "small value" bits as the summations get large. Especially when the variance is close to zero you might run into run-times errors. For such serious work you might have to introduce an algorithm called compensated summation

import Foundation

extension Array where Element: FloatingPoint
{

    var sum: Element {
        return self.reduce( 0, + )
    }
    
    var average: Element {
        return self.sum / Element( count )
    }
    
    /**
     (for a floating point array) returns a tuple containing the average and the "standard deviation for populations"
     */
    var averageAndStandardDeviationP: ( average: Element, stDevP: Element ) {
        
        let sumsTuple = sumAndSumSquared
        
        let populationSize = Element( count )
        let average = sumsTuple.sum / populationSize
        
        let expectedXSquared = sumsTuple.sumSquared / populationSize
        let variance = expectedXSquared - (average * average )
        
        return ( average, sqrt( variance ) )
    }
    
    /**
     (for a floating point array) returns a tuple containing the sum of all the values and the sum of all the values-squared
     */
    private var sumAndSumSquared: ( sum: Element, sumSquared: Element ) {
        return self.reduce( (Element(0), Element(0) ) )
        {
            ( arg0, x) in
            let (sumOfX, sumOfSquaredX) = arg0
            return ( sumOfX + x, sumOfSquaredX + ( x * x ) )
        }
    }
}

Collectives™ on Stack Overflow

Swift Array extension for standard deviation

6 Answers 6

1 Comment

1 Comment

Comments

Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

1 Comment

Comments

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related