Linear regression in scikit-learn

Question

I started learning maching learning on Python using Pandas and Sklearn. I tried to use the LinearRegression().fit method :

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split 
house_data = pd.read_csv(r"C:\Users\yassine\Desktop\ml\OC-tp-ML\house_data.csv")
y = house_data[["price"]] 
x = house_data[["surface","arrondissement"]] 
X = house_data.iloc[:, 1:3].values  
x_train, x_test, y_train, y_test = train_test_split (x, y, test_size=0.25, random_state=1) 
model = LinearRegression()
model.fit(x_train, y_train)

When I run the code, I have this message :

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Can You help me please.

The error tells you the problem, you have NaN values, infinite values, or extremely large values that scikit can't handle. Check for NaN rows in your data and try to remove them — G. Anderson
– G. Anderson, Commented Dec 13, 2018 at 16:11
I got this :house_data.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 827 entries, 0 to 826 Data columns (total 3 columns): price 827 non-null int64 surface 822 non-null float64 arrondissement 822 non-null float64 dtypes: float64(2), int64(1) memory usage: 19.5 KB — Yass Abbah
– Yass Abbah, Commented Dec 13, 2018 at 16:14
Please do not use the comments space for posting code & results - edit & update your post instead — desertnaut
– desertnaut, Commented Dec 13, 2018 at 16:24

Charles Landau · Accepted Answer · 2018-12-13 16:20:43Z

4

Machine learning models may require you to impute the data as part of your data cleaning process. Linear regression cares a lot about the yhat, so I usually start with imputing the mean. If you aren't comfortable imputing the missing data, you can drop the observations that contain NaN (provided you only have a small proportion of NaN observations.)

Imputing the mean can look like this:

df = df.fillna(df.mean())

Imputing to zero can look like this:

df = df.fillna(0)

Imputing to a custom result can look like:

df = df.fillna(my_func(args))

Dropping altogether can look like:

df = df.dropna()

Prepping so that inf may be caught by these methods ahead of time can look like:

df.replace([np.inf, -np.inf], np.nan)

answered Dec 13, 2018 at 16:20

Charles Landau

4,2751 gold badge13 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Linear regression in scikit-learn

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related