I'm currently using Naive Bayes to classify a bunch of texts. I have multiple categories. Right now I just output the posterior probability and the category, but what I would like to do is rank the categories based on the posterior probabilities and use the 2nd, 3rd place categories as "back up" categories.
Here's an example:
df = pandas.DataFrame({ 'text' : pandas.Categorical(["I have wings","Metal wings","Feathers","Airport"]), 'true_cat' : pandas.Categorical(["bird","plane","bird","plane"])})
text true_cat
-----------------------
I have wings bird
Metal wings plane
Feathers bird
Airport plane
What I'm doing:
new_cat = classifier.classify(features(text))
prob_cat = classifier.prob_classify(features(text))
Eventual Output:
new_cat prob_cat text true_cat
bird 0.67 I have wings bird
bird 0.6 Feathers bird
bird 0.51 Metal wings plane
plane 0.8 Airport plane
I have found a couple examples using classify_many and prob_classify_many but since I'm new to Python I'm having trouble translating it to my problem. I haven't seen it used with pandas anywhere.
I want it to look like this:
df_new = pandas.DataFrame({'text': pandas.Categorical(["I have wings","Metal wings","Feathers","Airport"]),'true_cat': pandas.Categorical(["bird","plane","bird","plane"]), 'new_cat1': pandas.Categorical(["bird","bird","bird","plane"]), 'new_cat2': pandas.Categorical(["plane","plane","plane","bird"]), 'prob_cat1': pandas.Categorical(["0.67","0.51","0.6","0.8"]), 'prob_cat2': pandas.Categorical(["0.33","0.49","0.4","0.2"])})
new_cat1 new_cat2 prob_cat1 prob_cat2 text true_cat
-----------------------------------------------------------------------
bird plane 0.67 0.33 I have wings bird
bird plane 0.51 0.49 Metal wings plane
bird plane 0.6 0.4 Feathers bird
plane bird 0.8 0.2 Airport plane
Any help would be appreciated.