Multivariate multiple linear regression using Sklearn

Question

I want to train a linear model Y = M_1*X_1 + M_2*X_2 using sklearn with multidimensional input and output samples (e.g. vectors). I tried the following code:

from sklearn import linear_model
from pandas import DataFrame 

x1 = [[1,2],[2,3],[3,4]]
x2 = [[1,1],[3,2],[3,5]]
y = [[1,0],[1,2],[2,3]]
model = {
    'vec1': x1,
    'vec2': x2,
    'compound_vec': y}

df = DataFrame(model, columns=['vec1','vec2','compound_vec'])
x = df[['vec1','vec2']].astype(object)
y = df['compound_vec'].astype(object)
regr = linear_model.LinearRegression()
regr.fit(x,y)

But I get the following error:

regr.fit(x,y)
 ...
array = array.astype(np.float64)
ValueError: setting an array element with a sequence.

Does anyone know what is wrong with the code? and if this is a right way to train Y = M_1*X_1 + M_2*X_2?

Is your goal, in the end, to also learn and predict multiple output values at once, as your first sentence may still suggest (so is Y multidimensional in the formula)? Or is it only reformatting the data (as done in the accepted answer)? — Marcus V.
– Marcus V., Commented Aug 24, 2018 at 12:08
@MarcusV. I need to train the model so that given two multidimensional inputs like vectors, it predicts the output in the same space (vector), so M_1 and M_2 are in the matrix space. In case of having one independent variable it goes well, but I am confused by having two independent variables. — Mila
– Mila, Commented Aug 24, 2018 at 12:15
@Shimil: There is nothing to get confused here. In Y = M_1*X_1 + M_2*X_2, for a given value of X_1 and a given value of X_2, you will have a corresponding Y value. So if you have 6 pairs of X_1 and X_2 values as you have in your data, you will have 6 output values of Y — Sheldore
– Sheldore, Commented Aug 24, 2018 at 13:07
@Bazingaa it maybe still be that Shimil wants to actually have multiple outputs/dependent variables, but then linear regression won't work out of the box. It may work using the [MultiOutputRegressor](sklearn.multioutput.MultiOutputRegressor) wrapper, with the assumption that both y can be predicted independently (as it fits one model per output). — Marcus V.
– Marcus V., Commented Aug 24, 2018 at 13:52
Hmm you are right. It could be. I just took the equation Shimil provided and tried to find why the code was complaning. — Sheldore
– Sheldore, Commented Aug 24, 2018 at 13:54

Sheldore · Accepted Answer · 2018-08-24 11:54:53Z

3

Just flatten your x1, x2 and y lists and you are good to go. One way to do that is using arrays as follows:

import numpy as np
x1 =np.array(x1).flatten()
x2 =np.array(x2).flatten()
y =np.array(y).flatten()

Second way to do it is using ravel as:

x1 =np.array(x1).ravel()
x2 =np.array(x2).ravel()
y =np.array(y).ravel()

Third way without using NumPy is by using list comprehension as:

x1 =[j for i in x1 for j in i]
x2 =[j for i in x2 for j in i]
y =[j for i in y for j in i]

There might be more ways but you got what the problem was. For more ways, you can have a look here

Output

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

edited Aug 24, 2018 at 11:54

answered Aug 24, 2018 at 11:44

Sheldore

39.2k9 gold badges63 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Multivariate multiple linear regression using Sklearn

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related