0

I am trying to create an automation tool for scraping a site. As part of that, I am making a Python script that utilizes the Remote Debugging protocol through this library: https://github.com/jpramosi/geckordp

My problem is, that the elements that are of interest to me, reside inside an iframe. Therefore, while elements outside of the iframe can be easily selected through a querySelect, that does not work for elements inside the iframe. As a workaround, I have resorted to manually reaching these elements through repeated traversing of children nodes:

val = walker.query_selector(val["actor"], ".last-non-iframe-class")['node']
# print(val)
children = walker.children(val["actor"])
val = children[1]
children = walker.children(val["actor"])
val = children[0]
children = walker.children(val["actor"])
val = children[1]
children = walker.children(val["actor"])
val = children[0]
.
.
.

But this is too slow for my needs. Is there any way to make querySelect work with elements inside an iframe and cut down on the number of requests I have to make to the debugger server?

The Firefox instance runs on a Linux machine, the Python code too. The Firefox instance is running normally (not headless).

If that is not easy for Firefox, is there a more feasible way with Chrome Remote Debugging?

3
  • using Selenium it would need to use driver.switch_to.frame(iframe) and later it allows to use queries inside this iframe Commented Apr 21 at 19:27
  • in geckordp I found switch_to_frame but I don't know if it is what you need. Commented Apr 21 at 19:32
  • @furas I cannot use Selenium in my case, hence the use of the lowest level RDP. I will get back to this thread after I test the function above. Commented Apr 21 at 22:55

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.