2

I need to scrape some webpages and extract content from them. I'm planning to select some specific keywords and map the data that has some relationship b/w them. But I have no Idea, how I could do that. Could anyone suggest me some algorithms for doing it?.

For example I need to download some webpages about apples and map the relevant data about apples to it and store in database so that, if someone needs specific information about it, I could provide it fastly and accurately.

Also it would be helpful pointing out helpful libraries too. I'm planning to do it in python.

1
  • 1
    There is 1 famous main algorithm for that. I suggest you to search on Google. Commented May 14, 2011 at 12:33

2 Answers 2

1

Have a look at NLTK, Pattern or Orange modules.

As a start "Programming collective intelligence: building smart web 2. 0 applications" by Toby Segaran is a good book to read.

Sign up to request clarification or add additional context in comments.

Comments

1

You could try algorithms based on term frequency–inverse document frequency TF-IDF, in Java I would recommend Solr ... well actually you could use Solr and access it with python see here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.