0

Ok, Im stuck. Im making a little web scraping python script using selenium and PhantomJS. The page that I working on has the data I want inside an iframe document that my web driver does not run.

<main Page Heads etc>

   <blah>

   <iframe 1 src="src1" ... etc etc>
    #document
      <tag>
      <tag>
      <iframe2 src="src2"><iframe2>
   <iframe1>

   <blah>

<end of webpage DOM>

I want to get the src of iframe2. I tried to run the src1 URL through my webdriver but all I get out is the raw page html, not the loaded webpage elements, iframe2 must be created by some script inside iframe1, but I can't get my webdriver to run the script.

Any ideas?

This what im doing to run the javascript on webpages to get the complied page DOM:

from selenium import webdriver 

self.driver = webdriver.PhantomJS()
self.driver.get(url)
page = self.driver.page_source
soup = BeautifulSoup(page,'html.parser')

1 Answer 1

1

You can't get a full page_source. In the case of iframe, you should use the following command: switch_to.frame(iframe_element), so you can get an element inside

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC

self.driver = webdriver.PhantomJS()
self.driver.get(url)



WebDriverWait(self._driver, 50).until(
            EC.presence_of_all_elements_located
            ((By.XPATH,
              '//iframe[@id="iframegame"]'))
        )

iframe_element = self.driver.find_element_by_xpath('//iframe[@id="iframegame"]')

self.driver.switch_to.frame(iframe_element)

tag = self.driver.find_element_by_xpath('//tag')

And back again, you can get an outer element of iframe using the following command;

self.driver.switch_to.default_content()
Sign up to request clarification or add additional context in comments.

22 Comments

I tried but I got:File "NBA/getLinks_nbastream_bilasport.py", line 112, in getIframeLink self.driver.switch_to.frame(iframe1) TypeError: Object of type 'Tag' is not JSON serializable
Could you share your result?
Can you share the full html code or the link to the website?
This is what iframe1 looks like: <iframe allowfullscreen="" frameborder="0" height="100%" id="iframegame" scrolling="no" src="http://bilasport.net/iframes/d/toronto-raptors-vs-boston-celtics-25247.html" width="100%"></iframe> I cant put the whole thing because this comment box wont let me
When I "inspect" the page in chrome, iframe1 is full of other element, including iframe2. But I can still only get the source html with my script
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.