How to fix sklearn multiple linear regression ValueError in python (inconsistent numbers of samples: [2, 1])

Question

I had my linear regression working perfectly with a single feature. Ever since trying to use two I get the following error: ValueError: Found input variables with inconsistent numbers of samples: [2, 1]

The first print statement is printing the following: (2, 6497) (1, 6497)

Then the code crashes at the train_test_split phase.

Any ideas?

feat_scores = {}
X = df[['alcohol','density']].values.reshape(2,-1)   
y = df['quality'].values.reshape(1,-1)

print (X.shape, y.shape)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print (X_train.shape, y_train.shape)
print (X_test.shape, y_test.shape)

reg = LinearRegression()
reg.fit(X_train, y_train)

reg.predict(y_train)

Venkatachalam · Accepted Answer · 2018-12-31 02:55:55Z

1

Your missed out in this line

X = df[['alcohol','density']].values.reshape(2,-1)   
y = df['quality'].values.reshape(1,-1)

Don't reshape the data into (2, 6497) (1, 6497), instead you have to give it as (6497,2) (6497,)

Sklearn takes the dataframes/Series directly. so you could give,

X = df[['alcohol','density']]
y = df['quality']

Also, you can predict only with X values, Hence

reg.predict(X_train)

or

reg.predict(X_test)

answered Dec 31, 2018 at 2:55

Venkatachalam

17k10 gold badges52 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to fix sklearn multiple linear regression ValueError in python (inconsistent numbers of samples: [2, 1])

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related