Scraping dynamic content through Selenium?

Question

I'm trying to scrap dynamic content from a Blog through Selenium but it always returns un rendered JavaScript.

To test this behavior I tried to wait till iframe loads completely and printed it's content which prints fine but again when I move back to parent frame it just displays un rendered JavaScript.

I'm looking for something in which I'm able to print completely rendered HTML content

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions

driver = webdriver.Chrome("path to chrome driver")   
driver.get('http://justgivemechocolateandnobodygetshurt.blogspot.com/')

WebDriverWait(driver, 40).until(expected_conditions.frame_to_be_available_and_switch_to_it((By.ID, "navbar-iframe")))

# Rendered iframe HTML is printed.
content = driver.page_source
print content.encode("utf-8")

# When I switch back to parent frame it again prints non rendered JavaScript.
driver.switch_to.parent_frame()
content = driver.page_source
print content.encode("utf-8")

@UmarIqbal, Have you tried selecting the element using one of the find_element methods? — RattleyCooper
– RattleyCooper, Commented Apr 21, 2016 at 20:29

alecxe · Accepted Answer · 2016-04-21 21:42:31Z

4

The problem is - the .page_source works only in the current context. There is that "current top-level browsing context" notation..Meaning, if you would call it on a default content - you would not get the inner HTML of the child iframeelements - for that you would have to switch into the context of a frame and call .page_source.

In other words, to get the very complete HTML of the page including the page source of the iframes, you would have to switch into the iframe contexts one by one and get the sources separately.

8 Comments

Umar Iqbal Over a year ago

Doesn't matter, still returns the old DOM.

alecxe Over a year ago

@UmarIqbal okay, what do you mean by the old DOM? And what is your desired output?

Umar Iqbal Over a year ago

by old DOM I meant un rendered JavaScript. All I want is a completely rendered HTML content.

alecxe Over a year ago

@UmarIqbal thanks, could you be more specific and point to an, perhaps, element you don't want to see in the page source? Note that even if I go to the website, wait for it to load and inspect the page source - I would still see the script tags with javascript there.

Umar Iqbal Over a year ago

Can you try running my code? the first print statement prints the dynamically loaded iframe. After that in second print statement I print the complete page source, It's supposed to print complete DOM along with that iframe but it doesn't.

|

Collectives™ on Stack Overflow

Scraping dynamic content through Selenium?

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related