Linear Regression not working due to wrong kind of array

Question

I try to deal with my homework. The Job is to take this Data and perform a linear regression on it.

The code is published here.

I am quite new to programming in Python and in data science. So I tried transforming as the interpreter suggests, but it didn't work. My first error was that there was a 2d array expected but 1d given. Then I took the pure array and put it into an empty one suggested by a StackOverflow answer now the error is that a scalar array is given but a 2d array is given.

import pandas as pd
from sklearn.preprocessing import StandardScaler

#Import
data = pd.read_csv('uscrime.txt', sep="\t")
crime = pd.concat([data], axis = 1)
print(crime)

from sklearn.linear_model import LinearRegression
regression = LinearRegression()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(crime.get("M"), crime.get("Crime"), test_size=0.2, random_state=0)

X_train_new = []
X_train_new.append(X_train.values)

y_train_new = []
y_train_new.append(y_train.values)

regression.fit(X_train_new, y_train_new)

Probably the first error was referring to X, and the second to y. X needs to be 2d in sklearn, but y should (in this case at least) be 1d. — Ben Reiniger
– Ben Reiniger ♦, Commented May 28, 2020 at 20:28
Sorry, I dont quite understand why y needs to be 1d here. When y gets changed to 1d this error appears: Found input variables with inconsistent numbers of samples: [1, 37] — Marvin
– Marvin, Commented May 28, 2020 at 21:08
Okay, i just made both 2d and it works now. I dont quite understand why but my output appears to be correct now: Out[59]: LinearRegression() but the regression.predict([[20]]) function does not work with the following error matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 37 is different from 1) — Marvin
– Marvin, Commented May 28, 2020 at 21:11

PalimPalim · Accepted Answer · 2020-06-01 11:17:01Z

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split


data = pd.read_csv("http://www.statsci.org/data/general/uscrime.txt", sep="\t")
x = data.loc[:, data.columns != 'Crime'].to_numpy() 
y = np.squeeze(data.loc[:,'Crime'].to_numpy())

regression = LinearRegression()

regression.fit(x, y)

scikit learn expects numpy arrays and not pandas dataframes. You need to convert from one to the other on top, you need to make sure that the array for y only has one dimension which I achieved via np.squeeze. Bonus: see above how you can directly load the csv from the website.

Stack Exchange Network

Linear Regression not working due to wrong kind of array

1 Answer 1

Your Answer

Hot Network Questions

Linear Regression not working due to wrong kind of array

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions