5

I've had a list of books in which each book belongs to a category.

  • Flying a Plane - Aviation
  • Painting a picture - Art
  • 1001 Recipes - Cooking

I have a huge enough sample set of data. I need to categorize my newer books using some algorithm. I know it'll never be a 100% accurate but a good guess is good for me.

What should I use to implement to do something like this? Should I go with Classifier4J and it's Vector Classifier?

Are there other tools that I should look at like Weka? It would be great if someone could point me to some articles/examples to get me started.

Thanks

2
  • You can take a look at rapid miner. Commented Jun 7, 2012 at 10:57
  • 1
    Have a look at this: java-text-classification-problem, you guys are doing almost exactly the same thing. Commented Jun 7, 2012 at 16:36

2 Answers 2

1

There's a course on https://www.coursera.org/course/ml called Machine Learning. If you look at your problem as classification you should train N One-vs-All classifiers where N is number of your classes (=categories). To train a classifier use on of algorithms described in Natural Language Processing class https://www.coursera.org/course/nlp, normally it will be similarity to existing classes http://nlp.stanford.edu/IR-book/html/htmledition/text-classification-and-naive-bayes-1.html. All this could be done in Apache Mahout with https://cwiki.apache.org/confluence/display/MAHOUT/Bayesian.

Sign up to request clarification or add additional context in comments.

Comments

1

Lingpipe seems to be a good solution and seems to work well. The included demo in Lingpipe is a good place to begin:

http://alias-i.com/lingpipe/demos/tutorial/classify/read-me.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.