10

I have a ".dat" file in which are saved values of X and Y (so a tuple (n,2) where n is the number of rows).

import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate as interp
from sklearn import linear_model

in_file = open(path,"r")
text = np.loadtxt(in_file)
in_file.close()
x = np.array(text[:,0])
y = np.array(text[:,1])

I created an instance for linear_model.LinearRegression(), but when I invoke the .fit(x,y) method I get

IndexError: tuple index out of range

regr = linear_model.LinearRegression()
regr.fit(x,y)

What did I do wrong?

6
  • Sorry I completely misread your question :( I've deleted the answer, if I can get a fix then I'll un-delete the edited answer. But can you provide more information? Such as your full code? Commented Nov 24, 2014 at 14:44
  • This is the code you need, there is nothing else important. Commented Nov 24, 2014 at 14:46
  • Really? What's linear_model? How did you get it? Commented Nov 24, 2014 at 14:46
  • That's really all now, thanks for the help. Commented Nov 24, 2014 at 14:48
  • Are x and Y of the same length? Commented Nov 24, 2014 at 14:49

1 Answer 1

17

Linear Regression expects X as an array with two dimensions and internally requires X.shape[1] to initialize an np.ones array. So converting X to an nx1 array would do the trick. So, replace:

regr.fit(x,y)

by:

regr.fit(x[:,np.newaxis],y)

This will fix the problem. Demo:

>>> from sklearn import datasets
>>> from sklearn import linear_model
>>> clf = linear_model.LinearRegression()
>>> iris=datasets.load_iris()
>>> X=iris.data[:,3]
>>> Y=iris.target
>>> clf.fit(X,Y)  # This will throw an error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/sklearn/linear_model/base.py", line 363, in fit
    X, y, self.fit_intercept, self.normalize, self.copy_X)
  File "/usr/lib/python2.7/dist-packages/sklearn/linear_model/base.py", line 103, in center_data
    X_std = np.ones(X.shape[1])
IndexError: tuple index out of range
>>> clf.fit(X[:,np.newaxis],Y)  # This will work properly
LinearRegression(copy_X=True, fit_intercept=True, normalize=False)

To plot the regression line use the below code:

>>> from matplotlib import pyplot as plt
>>> plt.scatter(X, Y, color='red')
<matplotlib.collections.PathCollection object at 0x7f76640e97d0>
>>> plt.plot(X, clf.predict(X[:,np.newaxis]), color='blue')
<matplotlib.lines.Line2D object at 0x7f7663f9eb90>
>>> plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much for the help! Another question: is it normal that now I get only a coefficent from linear regression? How can I plot its line?
@JackLametta, It's absolutely normal. These coefficients are used to predict X value given Y value. I've uploaded the code to plot line.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.