1

I have a dataframe with 3 columns: Y, X1, X2. I want to find the parameter estimates b1 and b2 by minimizing the sum of squares according to:

Objective function: minimize the sum of squares (Y - (b1*X1 + b2*X2))^2
Constraints: 0 < b1 < 2, 0 < b2 < 1
Initial guesses: b1=b2=0.5
Technique: Newton-Raphson

I know that I can use

scipy.optimize.minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)

but I can't see how to pass the columns from the dataframe in as all the examples I found from searching don't use columns from a dataframe.

I would be very grateful for any help.

2
  • scipy isn't pandas-aware. therefore, you'd extract the colums, e.g., scipy.optimize.minimize(fun, mydf['numeric_column'], args=()) Commented Feb 26, 2019 at 17:06
  • Thanks very much for this, but where you've got "mydf['numeric_column']" corresponds to where I should input the initial guess(es), i.e. b1=b2=0.5 which are not in the dataframe. Commented Feb 26, 2019 at 17:18

1 Answer 1

3

This could be some start-point for you. As long as the return of your objective function is scalar, it should be no problem. Pass the dataframe via the args-keywords in a tuple. See the Documentation of the minimize function to check which method you want to use.

EDIT: I changed the code based on the description in your comment.

import numpy as np
import scipy.optimize as opt
import pandas as pd

def main(df):
    x0 = [0.5,0.5]
    res = opt.minimize(fun=obj, x0=np.array(x0), args=(df), method="BFGS", bounds=[(0,2),(0,1)])
    return res

def obj(x, df):
    #maybe use a global variable to get the dataframe or via args
    sumSquares = np.mean((df["Y"] - (x[0]*df["X1"] + x[1]*df["X2"]))**2)
    return sumSquares

df = pd.DataFrame({"Y":np.random.rand(100),
                   "X1":np.random.rand(100),
                   "X2":np.random.rand(100)})
print(main(df))
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks very much for this, but it corresponds to what I've found in my searches to date and I can't relate the obj(x) function to my dataframe described above. The dataframe just has the 3 columns described above with each one containing numerical values.
Thanks very very much. That makes sense and works perfect on my dataframe. Much appreciated.
Happy to help! If the answer solves your problem, feel free to accept it:)
Thanks @f.wue for sharing this example. I tried it with my data set but I am getting this message in the results - : 'Desired error not necessarily achieved due to precision loss.' Any idea how to handle this?
No, but maybe check this question out: stackoverflow.com/questions/24767191/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.