Expected 2D array, got 1D array instead, Reshape Data

Question

I'm really stuck on this problem. I'm trying to use OneHotEncoder to encode my data into a matrix after using LabelEncoder but getting this error: Expected 2D array, got 1D array instead.

At the end of the error message(included below) it said to "Reshape my data" which I thought I did but it's still not working. If I understand Reshaping, is that just when you want to literally reshape some data into a different matrix size? For example, if I want to change a 3 x 2 matrix into a 4 x 6?

My code is failing on these 2 lines:

X = X.reshape(-1, 1) # I added this after I saw the error
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()

Here is the code I have so far:

# Data Preprocessing

# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Import Dataset
dataset = pd.read_csv('Data2.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values
df_X = pd.DataFrame(X)
df_y = pd.DataFrame(y)

# Replace Missing Values
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 3:5 ])
X[:, 3:5] = imputer.transform(X[:, 3:5])


# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])

# Transform into a Matrix

onehotencoder1 = OneHotEncoder(categorical_features = [0])
X = X.reshape(-1, 1)
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()


# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])

Here is the full error message:

 File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 1809, in _transform_selected
    X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)

  File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array
    "if it contains a single sample.".format(array))

ValueError: Expected 2D array, got 1D array instead:
array=[  2.00000000e+00   7.00000000e+00   3.20000000e+00   2.70000000e+01
   2.30000000e+03   1.00000000e+00   6.00000000e+00   3.90000000e+00
   2.80000000e+01   2.90000000e+03   3.00000000e+00   4.00000000e+00
   4.00000000e+00   3.00000000e+01   2.76700000e+03   2.00000000e+00
   8.00000000e+00   3.20000000e+00   2.70000000e+01   2.30000000e+03
   3.00000000e+00   0.00000000e+00   4.00000000e+00   3.00000000e+01
   2.48522222e+03   5.00000000e+00   9.00000000e+00   3.50000000e+00
   2.50000000e+01   2.50000000e+03   5.00000000e+00   1.00000000e+00
   3.50000000e+00   2.50000000e+01   2.50000000e+03   0.00000000e+00
   2.00000000e+00   3.00000000e+00   2.90000000e+01   2.40000000e+03
   4.00000000e+00   3.00000000e+00   3.70000000e+00   2.77777778e+01
   2.30000000e+03   0.00000000e+00   5.00000000e+00   3.00000000e+00
   2.90000000e+01   2.40000000e+03].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Any help would be great.

Your array is 1D it has to be 2D .... where ever you are getting the error just add numpy.asmatrix(data) where data is the data that you are passing... or you can reshape ... Passing 1D array has been deprecated in recent versions of sklearn — Jai
– Jai, Commented Dec 25, 2017 at 2:12
Hi @JayShah in my code I added: X = X.reshape(-1, 1). Is this the correct way to reshape data? — wolfbagel
– wolfbagel, Commented Dec 25, 2017 at 2:15
yes X = X.reshape(-1, 1) is the right way is to reshape data but in the error but this will only work if your X is a numpy array and not list... If it is a list than make your array list of list ... from the error message I can clearly see array = [ ] is 1D because it has one opening and clasing brackets and after reshaping please remove X[:, 1] in transform and just put X — Jai
– Jai, Commented Dec 25, 2017 at 2:25

Jai · Accepted Answer · 2017-12-25 02:37:08Z

2

try changing you code to this

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Import Dataset
dataset = pd.read_csv('Data2.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values
df_X = pd.DataFrame(X)
df_y = pd.DataFrame(y)

# Replace Missing Values
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 3:5 ])
X[:, 3:5] = imputer.transform(X[:, 3:5])


# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])

# Transform into a Matrix

onehotencoder1 = OneHotEncoder(categorical_features = [0])
res_0 = onehotencoder1.fit_transform(X[:, 0].reshape(-1, 1))  # <=== Change
X[:, 0] = res_0.ravel()

# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])

If you are getting error at labelencoder_x1.fit_transform(X[:, 1]) then make it labelencoder_x1.fit_transform(X[:, 1].reshape(-1, 1))

edited Dec 25, 2017 at 2:37

answered Dec 25, 2017 at 2:30

Jai

3,3203 gold badges20 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

wolfbagel Over a year ago

Thanks for the solution! I'm running it line by line but it's failing on this line: "X[:, 0] = res_0.ravel()", saying "ravel not found".

Jai Over a year ago

try np.ravel(res_0)

wolfbagel · Accepted Answer · 2017-12-25 03:18:16Z

Ok I finally got the code to work. Please see the solution below:

# Data Preprocessing

# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Import Dataset
dataset = pd.read_csv('Data2.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values
df_X = pd.DataFrame(X)
df_y = pd.DataFrame(y)

# Replace Missing Values
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 3:5 ])
X[:, 3:5] = imputer.transform(X[:, 3:5])


# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])


# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])


# Transform Name into a Matrix
onehotencoder1 = OneHotEncoder(categorical_features = [0])
X = onehotencoder1.fit_transform(X).toarray()

# Transform University into a Matrix
onehotencoder2 = OneHotEncoder(categorical_features = [6])
X = onehotencoder2.fit_transform(X).toarray()

Syed Muhammad Asad · Accepted Answer · 2019-11-19 16:05:50Z

0

I got same error when I was trying to do same. I was transforming single column of data. Here, how I override this problem

encoding_X = OneHotEncoder(categories = [np.unique(X[:,0]).tolist()])
encoding_X.fit(np.unique(X[:,0]).reshape(-1,1).tolist())
encoding_X.transform(X[:,0].reshape(-1,1).tolist()).toarray()

answered Nov 19, 2019 at 16:05

Syed Muhammad Asad

2281 gold badge2 silver badges13 bronze badges

Collectives™ on Stack Overflow

Expected 2D array, got 1D array instead, Reshape Data

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related