Sklearn.linear_model : ValueError: Found input variables with inconsistent numbers of samples: [1, 20]

Question

I am trying to implement linear regression but when i run the code I get this error ValueError: Found input variables with inconsistent numbers of samples: [1, 20] in line-->linear.fit(x_train1,y_train1) [data type of x_train1,x is series & y_ is series].

I changed x=dataset.iloc[:,:-1] datatype of x_train, x changes to dataframe(y_ is still series) and it works correctly

So why it only works when x is dataframe eventhough y is still series??

import pandas as pd
import numpy as np
import matplotlib.pyplot

dataset=pd.read_csv('Salary_Data.csv')

x=dataset.iloc[:,0]

y=dataset.iloc[:,1]

from sklearn.model_selection import train_test_split
x_train1,x_test1,y_train1,y_test1=
train_test_split(x,y,test_size=1/3,random_state=0)

#implementing simple linear regression
from sklearn.linear_model import LinearRegression

linear=LinearRegression()

linear.fit(x_train1,y_train1)

y_pred=linear.predict(x_test1)

YearsExperience,Salary 1.1,39343.00 1.3,46205.00 1.5,37731.00 2.0,43525.00 2.2,39891.00 2.9,56642.00 3.0,60150.00 3.2,54445.00 3.2,64445.00 3.7,57189.00 3.9,63218.00 4.0,55794.00 4.0,56957.00 4.1,57081.00 4.5,61111.00 4.9,67938.00 5.1,66029.00 5.3,83088.00 5.9,81363.00 6.0,93940.00 6.8,91738.00 7.1,98273.00 7.9,101302.00 — smit shah
– smit shah, Commented Dec 22, 2017 at 20:13

O.Suleiman · Accepted Answer · 2017-12-22 22:30:45Z

1

Scikit-Learn does not accept rank 1 array (1 dimensional data), i.e: if you call shape method on your x:

x.shape

it will return something that looks like this (23,), 23 being the number of rows where it should be (23,1).

In order for it to work, try using reshape:

x = dataset.iloc[:,0]
x = x.reshape((len(x),1))
...

answered Dec 22, 2017 at 22:30

O.Suleiman

9281 gold badge6 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

smit shah Over a year ago

But the data form of y_train1 is still series and sklearn is accepting it! So can u exlain why it need only first argument as dataframe?

O.Suleiman Over a year ago

Y is the attribute you want to predict where X is the set of features you want to use to predict Y. By default, Y is always 1 dimensional since you can't predict more than one attribute using single predict statement.

smit shah Over a year ago

Thx bro really appreciate ur help

Collectives™ on Stack Overflow

Sklearn.linear_model : ValueError: Found input variables with inconsistent numbers of samples: [1, 20]

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related