Classify text using NaiveBayesClassifier

Question

I have a text file with a sentence on each line: eg ""Have you registered your email ID with your Bank Account?"

I want to classify it into interrogative or not. FYI these are sentences from bank websites. I've seen this answer with this nltk code block:

import nltk
nltk.download('nps_chat')
posts = nltk.corpus.nps_chat.xml_posts()[:10000]


def dialogue_act_features(post):
    features = {}
    for word in nltk.word_tokenize(post):
        features['contains({})'.format(word.lower())] = True
    return features

featuresets = [(dialogue_act_features(post.text), post.get('class')) for post in posts]
size = int(len(featuresets) * 0.1)
train_set, test_set = featuresets[size:], featuresets[:size]
classifier = nltk.NaiveBayesClassifier.train(train_set)
print(nltk.classify.accuracy(classifier, test_set))

So I did some preprocessing to my text file i.e. stemming words, removing stop words etc, to make each sentence into a bag of words. From the code above, I have a trained classifier. How do I implement it on my text file of sentences (either raw or preprocessed)?

Update: here is an example of my text file.

You need to convert the documents using (scikit-learn.org/stable/modules/generated/…) and then use the classifier. Can you upload your data? — seralouk
– seralouk, Commented May 29, 2018 at 8:37
@seralouk thank you for your response, I will look at the link now! I have updated the question with an example of my data. — PolkaDot
– PolkaDot, Commented May 29, 2018 at 9:09
not sure why I'm being downvoted, is there any more information I should be providing? — PolkaDot
– PolkaDot, Commented May 29, 2018 at 9:10
@seralouk no they are all strings of sentences. I have given the preprocessed version. If you want I can attach the processed version where numbers are taken out, words are stemmed, and stopwords are removed? — PolkaDot
– PolkaDot, Commented May 29, 2018 at 9:12
@seralouk can't I train the classifier using nps_chat and get the sample data from that? — PolkaDot
– PolkaDot, Commented May 29, 2018 at 9:13

seralouk · Accepted Answer · 2018-05-29 11:54:12Z

1

Assuming that you have preprocessed the document data as we discussed, you can do the following:

import nltk
nltk.download('nps_chat')
posts = nltk.corpus.nps_chat.xml_posts()[:10000]


def dialogue_act_features(post):
    features = {}
    for word in nltk.word_tokenize(post):
        features['contains({})'.format(word.lower())] = True
    return features

featuresets = [(dialogue_act_features(post.text), post.get('class')) for post in posts]
size = int(len(featuresets) * 0.1)
train_set, test_set = featuresets[size:], featuresets[:size]

classifier = nltk.NaiveBayesClassifier.train(featuresets)
print(nltk.classify.accuracy(classifier, test_set))

0.668

For your data, you can iterate in your lines and fit, predict:

classifier = nltk.NaiveBayesClassifier.train(featuresets)
print(classifier.classify(dialogue_act_features(line)))

edited May 29, 2018 at 11:54

answered May 29, 2018 at 9:18

seralouk

33.6k10 gold badges127 silver badges141 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

PolkaDot · Accepted Answer · 2018-05-29 10:24:22Z

0

Doing this for all lines in the text file works:

classifier = nltk.NaiveBayesClassifier.train(featuresets)
print(classifier.classify(dialogue_act_features(line)))

answered May 29, 2018 at 10:24

PolkaDot

5602 gold badges8 silver badges18 bronze badges

Collectives™ on Stack Overflow

Classify text using NaiveBayesClassifier

2 Answers 2

Assuming that you have preprocessed the document data as we discussed, you can do the following:

For your data, you can iterate in your lines and fit, predict:

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Assuming that you have preprocessed the document data as we discussed, you can do the following:

For your data, you can iterate in your lines and fit, predict:

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related