0

I am trying to get the HTML code of the following website. http://fortune.com/fortune500/list/

But the problem is when we visit this website in browser, it only shows the first 20 companies and when we go to the bottom part of website it loads the next 50 companies.

How do i get the first 700 companies in HTML code from this website? I tried the code from this website https://www.mkyong.com/java/how-to-get-url-content-in-java/ to get the HTML content but as expected it gives only the top 20 companies

Any help is much appreciated Thanks

2
  • Programmatically you won't be able to do that because Ajax calls are involved in that HTML. The approach in that link gets the HTML as it, a text with an HTML structure. Commented Dec 8, 2017 at 1:32
  • Thanks . I can parse the HTML structure in downstream but the problem is i need to get more companies list from the fortune500list website (Not first 20 companies) Commented Dec 8, 2017 at 1:41

2 Answers 2

1

CURL: http://fortune.com/api/v2/list/2013055/expand/item/ranking/asc/{{start_from}}/{{num_limit}}

Example: http://fortune.com/api/v2/list/2013055/expand/item/ranking/asc/1/100

The site "fortune.com" return max 100 elements form CURL.

The CURL return a JSON.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. Calling api returns the data but I am trying in a different approach of parse/crawl the website and find the data in it
The site fortune.com doesn't load all the data at the beginning.. Therefore you don't recover them.. (Sorry My English)
0

You should use Selenium for this. Here is a tutorial on how to use it with StormCrawler. You could also use it directly if you wanted to.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.