Linear regression suing Scikitlearn(linear regression)

Question

Here is my scenarion.

data = [[25593.14, 39426.66],
        [98411.00, 81869.75],
        [71498.80, 62495.80],
        [38068.00, 54774.00],
        [58188.00, 43453.65],
        [10220.00, 18465.25]]

About data is my data model.

x-cordinates refers "Salary" y-cordinates refers "Expenses"

I want to predict the expense when I give "Salary" i.e., X-coordinate.

Here is my sample code. Please help me out.

from sklearn.linear_model import LinearRegression

data = [[25593.14, 39426.66],
        [98411.00, 81869.75],
        [71498.80, 62495.80],
        [38068.00, 54774.00],
        [58188.00, 43453.65],
        [10220.00, 18465.25]]

salary=[]
expenses=[]

for dataset in data:
    # import pdb; pdb.set_trace()
    salary.append(dataset[0])
    expenses.append(dataset[1])

model = LinearRegression()
model.fit(salary, expenses)
prediction = model.predict([10200.00])
print(prediction)

Error which I got:

ValueError: Expected 2D array, got 1D array instead:
array=[ 25593.14  98411.    71498.8   38068.    58188.    10220.  ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample

.

The error tells you what to do and what the problem is (and there are probably 5 other questions here in regards to this error). I highly recommend reading sklearn's docs to see what shapes are expected. After that, read some numpy docs to not do that list-append stuff you are doing! — sascha
– sascha, Commented Feb 20, 2018 at 17:33
To add to the above, the first argument of this line: model.fit(salary, expenses) is where the error is occurring, it expects a matrix of training data for the first argument, "X". This may help — fenrisulfr
– fenrisulfr, Commented Feb 20, 2018 at 17:36

sjw · Accepted Answer · 2018-02-20 17:39:27Z

4

As suggested by the comments, something like this would be a better way to work with data you want to feed into a scikit learn model. Another example can be seen here.

from sklearn.linear_model import LinearRegression
import numpy as np

data = np.array(
        [[25593.14, 39426.66],
        [98411.00, 81869.75],
        [71498.80, 62495.80],
        [38068.00, 54774.00],
        [58188.00, 43453.65],
        [10220.00, 18465.25]]
).T

salary = data[0].reshape(-1, 1)
expenses = data[1]

model = LinearRegression()
model.fit(salary, expenses)
prediction = model.predict(np.array([10200.00]).reshape(-1, 1))
print(prediction)

answered Feb 20, 2018 at 17:39

sjw

6,5512 gold badges30 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Krishna Barri Over a year ago

Thanks for your help

Eliethesaiyan · Accepted Answer · 2018-02-20 17:43:54Z

0

quick fix, replace this line

model.fit(np.array([salary]), np.array([expenses]))

X is expected to be an array of arrays, array([arr1,arr2,array3,...]) same of arr1 and arr2 being arrays of at least one feature, same for y,it should be an array of containing a list of values array[label1,label2,label3,...]

edited Feb 20, 2018 at 17:43

answered Feb 20, 2018 at 17:38

Eliethesaiyan

2,3211 gold badge22 silver badges36 bronze badges

1 Comment

Krishna Barri Over a year ago

Thanks for your help.

Collectives™ on Stack Overflow

Linear regression suing Scikitlearn(linear regression)

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related