0

I'm used to using Excel for this kind of problem but I'm trying my hand at Python for now.

Basically I have two sets of arrays, one constant, and the other's values come from a user-defined function.

This is the function, simple enough.

import scipy.stats as sp

def calculate_probability(spread, std_dev):
    return sp.norm.sf(0.5, spread, std_dev)

I have two arrays of data, one with entries that run through the calculate_probability function (these are the spreads), and the other a set of constants called expected_probabilities.

spreads = [10.5, 9.5, 10, 8.5]

expected_probabilities = [0.8091, 0.7785, 0.7708, 0.7692]

The below function is what I am seeking to optimise.

import numpy as np
def calculate_mse(std_dev):
    spread_inputs = np.array(spreads)
    model_probabilities = calculate_probability(spread_inputs,std_dev)
    subtracted_vector = np.subtract(model_probabilities,expected_probabilities)
    vector_powered = np.power(subtracted_vector,2)
    mse_sum = np.sum(vector_powered)
    return mse_sum/len(spreads)

I would like to find a value of std_dev such that function calculate_mse returns as close to zero as possible. This is very easy in Excel using solver but I am not sure how to do it in Python. What is the best way?

EDIT: I've changed my calculate_mse function so that it only takes a standard deviation as a parameter to be optimised. I've tried to return Andrew's answer in an API format using flask but I've run into some issues:

class Minimize(Resource):

    std_dev_guess = 12.0  # might have a better guess than zeros
    result = minimize(calculate_mse, std_dev_guess)

    def get(self):
        return {'data': result},200

api.add_resource(Minimize,'/minimize')

This is the error:

NameError: name 'result' is not defined

I guess something is wrong with the input?

2
  • 1
    You need to change {'data': result},200 to {'data': self.result},200. Code wise, I have a couple suggestions, see edit. Commented Jul 10, 2020 at 15:41
  • Thanks so much. I've managed to get a successful optimization with that! I'll look over your code so I can clear up the boilerplate, but thanks so much again. Commented Jul 10, 2020 at 16:33

1 Answer 1

1

I'd suggest using scipy's optimization library. From there, you have a couple options, the easiest from your current setup would be to just use the minimize method. Minimize itself has a massive amount of options, from simplex methods (default) to BFGS and COBYLA. https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html

from scipy.optimize import minimize

n_params = 4  # based of your code so far
spreads_guess = np.zeros(n_params)  # might have a better guess than zeros
result = minimize(calculate_mse, spreads_guess)

Give it a shot and if you have extra questions I can edit the answer and elaborate as needed.

Here's just a couple suggestions to clean up your code.

class Minimize(Resource):

    def _calculate_probability(self, spread, std_dev):
        return sp.norm.sf(0.5, spread, scale=std_dev)
  
    def _calculate_mse(self, std_dev):
        spread_inputs = np.array(self.spreads)
        model_probabilities = self._calculate_probability(spread_inputs, std_dev)
        mse = np.sum((model_probabilities - self.expected_probabilities)**2) / len(spread_inputs)
        print(mse)
        return mse

    def __init__(self, expected_probabilities, spreads, std_dev_guess):
        self.std_dev_guess = std_dev_guess
        self.spreads = spreads
        self.expected_probabilities = expected_probabilities
        self.result = None

    def solve(self):
        self.result = minimize(self._calculate_mse, self.std_dev_guess, method='BFGS')

    def get(self):
        return {'data': self.result}, 200

# run something like
spreads = [10.5, 9.5, 10, 8.5]
expected_probabilities = [0.8091, 0.7785, 0.7708, 0.7692]
minimizer = Minimize(expected_probabilities, spreads, 10.)
print(minimizer.get())  # returns none since it hasn't been run yet, up to you how to handle this
minimizer.solve()
print(minimizer.get())

Sign up to request clarification or add additional context in comments.

3 Comments

Hey @AndrewHolmgren, thanks for pointing me toward scipy.optimize - I've edited my mse function and the input to just be the value I'm trying to optimise but it still doesn't work.
One question @AndrewHolmgren, I've tried to access the result via an API- so I've added api.add_resource(Minimize, '/minimize') at the end of your code. I get a TypeError: __init__() missing 3 required positional arguments: 'expected_probabilities', 'spreads', and 'std_dev_guess How do I expose the results to an API endpoint?
@clattenburgcake You should be able to pass in parameters like this stackoverflow.com/a/33740849/8056248 or this stackoverflow.com/a/39418645/8056248

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.