Sklearn train_test_split() error: Found input variables with inconsistent numbers of samples

Question

I am fitting a regression model on randomly generated X1,x2 and Y be the sum of x1, x2 but I am getting this error

ValueError: Found input variables with inconsistent numbers of samples: [2, 10000000]

Note:- I am doing this only for learning purposes

My code:-

X = np.random.random_integers(100000000,size=(2,10000000))
X=(X-(100000000/2))/(100000000/2) # Scaling [-1,1]
Y = X[0]+X[1]

regr = linear_model.LinearRegression()
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, 
random_state=0)
regr.fit(X_train,Y_train)

Djib2011 · Accepted Answer · 2019-07-16 18:50:28Z

2

You need the shape of your training data to be (num_samples, num_features) or in your case (10000000, 2). An easy way to fix this is to transpose X before feeding it to the train test split:

X_train, X_test, y_train, y_test = train_test_split(X.T, Y, test_size=0.2, random_state=0)
regr.fit(X_train, y_train)  # <-- also 'y_train' not 'Y_train' here

answered Jul 16, 2019 at 18:50

Djib2011

8,1185 gold badges29 silver badges40 bronze badges

Add a comment |

Stack Exchange Network

Sklearn train_test_split() error: Found input variables with inconsistent numbers of samples

1 Answer 1

Your Answer

Hot Network Questions

Sklearn train_test_split() error: Found input variables with inconsistent numbers of samples

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions