1

Given below is my code

dataset = np.genfromtxt('train_py.csv', dtype=float, delimiter=",")
X_train, X_test, y_train, y_test = train_test_split(dataset[:,:-1],dataset[:,-1], test_size=0.2,random_state=0)
model = tree.DecisionTreeClassifier(criterion='gini')
#y_train = y_train.tolist()
#X_train = X_train.tolist()
model.fit(X_train, y_train)
model.score(X_train, y_train)
predicted= model.predict(x_test)

I am trying to use the decision Tree classifier on a custom dataset imported using the numpy library. But I get a ValueError which is given below when I try to fit the model.I tried using both numpy arrays and non numpy arrays such as lists but still dont seem to figure out what is causing the error. Any help appreciated.

    Traceback (most recent call last):
  File "tree.py", line 19, in <module>
    model.fit(X_train, y_train)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/tree/tree.py", line 177, in fit
    check_classification_targets(y)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 173, in check_classification_targets
    raise ValueError("Unknown label type: %r" % y)

ValueError: Unknown label type: array([[ 252.3352],....<until end of array>

1 Answer 1

9

python (scikit-learn) expects you to pass something that is label-like, thus: integer, string, etc. floats are not a typical encoding form of finite space, they are used for regression.

docu: fit X_train The training input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csc_matrix.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. Passing Y_train.astype(int) and X_train.astype(int) did the trick

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.