2

My code is :

 import pandas as pd
data = pd.read_table('train.tsv')

X=data.Phrase
Y=data.Sentiment
from sklearn import cross_validation
X_train,X_test,Y_train,Y_test=cross_validation.train_test_split(X,Y,test_size=0.2,random_state=0)
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(X,Y)

I get the error :ValueError: could not convert string to float:

What changes can I make that my code works?

2
  • What is data.info() ? All data are numeric? Commented May 25, 2017 at 9:29
  • no it consists of strings too. X.phrase has string data. Y.Sentiment has numeric data. Commented May 25, 2017 at 9:33

2 Answers 2

3

You can't pass in text data into MultinomialNB of scikit-learn as stated in its documentation.

None of the algorithms in scikit-learn works directly with text data. You need to do some preprocessing to get desired output. You'll need to first extract the features from text data using techniques like bagging or tokenizing. Have a look at this link for better understanding.

You also might want to look at using NLTK for such use cases as yours.

Sign up to request clarification or add additional context in comments.

Comments

0

ValueError when using Multinomial Naive Bayes classifier

You probably should preprocess your data as shown in the answer above.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.