I'm really stuck on this problem. I'm trying to use OneHotEncoder to encode my data into a matrix after using LabelEncoder but getting this error: Expected 2D array, got 1D array instead.
At the end of the error message(included below) it said to "Reshape my data" which I thought I did but it's still not working. If I understand Reshaping, is that just when you want to literally reshape some data into a different matrix size? For example, if I want to change a 3 x 2 matrix into a 4 x 6?
My code is failing on these 2 lines:
X = X.reshape(-1, 1) # I added this after I saw the error
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()
Here is the code I have so far:
# Data Preprocessing
# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Import Dataset
dataset = pd.read_csv('Data2.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values
df_X = pd.DataFrame(X)
df_y = pd.DataFrame(y)
# Replace Missing Values
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 3:5 ])
X[:, 3:5] = imputer.transform(X[:, 3:5])
# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])
# Transform into a Matrix
onehotencoder1 = OneHotEncoder(categorical_features = [0])
X = X.reshape(-1, 1)
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()
# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])
Here is the full error message:
File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 1809, in _transform_selected
X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)
File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[ 2.00000000e+00 7.00000000e+00 3.20000000e+00 2.70000000e+01
2.30000000e+03 1.00000000e+00 6.00000000e+00 3.90000000e+00
2.80000000e+01 2.90000000e+03 3.00000000e+00 4.00000000e+00
4.00000000e+00 3.00000000e+01 2.76700000e+03 2.00000000e+00
8.00000000e+00 3.20000000e+00 2.70000000e+01 2.30000000e+03
3.00000000e+00 0.00000000e+00 4.00000000e+00 3.00000000e+01
2.48522222e+03 5.00000000e+00 9.00000000e+00 3.50000000e+00
2.50000000e+01 2.50000000e+03 5.00000000e+00 1.00000000e+00
3.50000000e+00 2.50000000e+01 2.50000000e+03 0.00000000e+00
2.00000000e+00 3.00000000e+00 2.90000000e+01 2.40000000e+03
4.00000000e+00 3.00000000e+00 3.70000000e+00 2.77777778e+01
2.30000000e+03 0.00000000e+00 5.00000000e+00 3.00000000e+00
2.90000000e+01 2.40000000e+03].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Any help would be great.
numpy.asmatrix(data)where data is the data that you are passing... or you can reshape ... Passing 1D array has been deprecated in recent versions of sklearnX = X.reshape(-1, 1)is the right way is to reshape data but in the error but this will only work if your X is a numpy array and not list... If it is a list than make your array list of list ... from the error message I can clearly seearray = [ ]is 1D because it has one opening and clasing brackets and after reshaping please removeX[:, 1]in transform and just put X