5

I want to train a linear model Y = M_1*X_1 + M_2*X_2 using sklearn with multidimensional input and output samples (e.g. vectors). I tried the following code:

from sklearn import linear_model
from pandas import DataFrame 

x1 = [[1,2],[2,3],[3,4]]
x2 = [[1,1],[3,2],[3,5]]
y = [[1,0],[1,2],[2,3]]
model = {
    'vec1': x1,
    'vec2': x2,
    'compound_vec': y}

df = DataFrame(model, columns=['vec1','vec2','compound_vec'])
x = df[['vec1','vec2']].astype(object)
y = df['compound_vec'].astype(object)
regr = linear_model.LinearRegression()
regr.fit(x,y)

But I get the following error:

regr.fit(x,y)
 ...
array = array.astype(np.float64)
ValueError: setting an array element with a sequence.

Does anyone know what is wrong with the code? and if this is a right way to train Y = M_1*X_1 + M_2*X_2?

7
  • Is your goal, in the end, to also learn and predict multiple output values at once, as your first sentence may still suggest (so is Y multidimensional in the formula)? Or is it only reformatting the data (as done in the accepted answer)? Commented Aug 24, 2018 at 12:08
  • @MarcusV. I need to train the model so that given two multidimensional inputs like vectors, it predicts the output in the same space (vector), so M_1 and M_2 are in the matrix space. In case of having one independent variable it goes well, but I am confused by having two independent variables. Commented Aug 24, 2018 at 12:15
  • @Shimil: There is nothing to get confused here. In Y = M_1*X_1 + M_2*X_2, for a given value of X_1 and a given value of X_2, you will have a corresponding Y value. So if you have 6 pairs of X_1 and X_2 values as you have in your data, you will have 6 output values of Y Commented Aug 24, 2018 at 13:07
  • @Bazingaa it maybe still be that Shimil wants to actually have multiple outputs/dependent variables, but then linear regression won't work out of the box. It may work using the [MultiOutputRegressor](sklearn.multioutput.MultiOutputRegressor) wrapper, with the assumption that both y can be predicted independently (as it fits one model per output). Commented Aug 24, 2018 at 13:52
  • Hmm you are right. It could be. I just took the equation Shimil provided and tried to find why the code was complaning. Commented Aug 24, 2018 at 13:54

1 Answer 1

3

Just flatten your x1, x2 and y lists and you are good to go. One way to do that is using arrays as follows:

import numpy as np
x1 =np.array(x1).flatten()
x2 =np.array(x2).flatten()
y =np.array(y).flatten()

Second way to do it is using ravel as:

x1 =np.array(x1).ravel()
x2 =np.array(x2).ravel()
y =np.array(y).ravel()

Third way without using NumPy is by using list comprehension as:

x1 =[j for i in x1 for j in i]
x2 =[j for i in x2 for j in i]
y =[j for i in y for j in i]

There might be more ways but you got what the problem was. For more ways, you can have a look here

Output

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.