4

I'm trying to run a prediction on training data with four features; my code:

from sklearn.cross_validation import train_test_split

X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42)

# Train
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Plot the decision boundary
plt.subplot(2, 3, pairidx + 1)

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                     np.arange(y_min, y_max, plot_step))

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)

plt.xlabel(iris.feature_names[pair[0]])
plt.ylabel(iris.feature_names[pair[1]])
plt.axis("tight")

# Plot the training points
for i, color in zip(range(n_classes), plot_colors):
    idx = np.where(y == i)
    plt.scatter(X[idx, 0], X[idx, 1], c=color, label=iris.target_names[i],
                cmap=plt.cm.Paired)

plt.axis("tight")

plt.suptitle("Decision surface of a decision tree using paired features")
plt.legend()
plt.show()

On my predict line: Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]) I get the following error:

Number of features of the model must match the input. Model n_features is 4 and input n_features is 2

The iris data is a 150x4 data set. How do I get this to work for 4 features?

1
  • Why are you using ravel(). Are you following any web tutorial? Post the link. What is plot_step, pairidx? Commented Mar 22, 2017 at 6:54

1 Answer 1

1
  • During training the number of feature that you provided is 4
  • But when you are predicting you are providing a sample with 2 features which is to be predicted
  • The number of feature used in training you have to use the same number of features when you are making a prediction
  • If you do the following : print(np.c_[xx.ravel(), yy.ravel()]) it will give you a shape of : (30, 2) if your plot_step is 1
  • The numpy array that you provide to the predict function as argument has to be of shape : (x, 4) where x can be any positive integer but the number of columns in your numpy array has to be necessarily 4
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.