1

I am building a scraper in Python and want to call scripts, which are in a HTML code, like web browsers do while entering a website. Is it possible to run scripts from Python level on the HTML code?

The purpose is to get the same HTML code as in 'DOM Inspector' in web browser, not the code you can simply download with

requests.get("https://example.com")

or when you use 'View Source' mode...

Let's say I get the code:

<!DOCTYPE html>
<html>
  <body>
    <h1>The script element</h1>
    <p id="demo"></p>

    <script>
      document.getElementById("demo").innerHTML = "Hello JavaScript!";
    </script>
  </body>
</html>

If I run the script it will change the content of <p> element and results will be

<!DOCTYPE html>
<html>
  <body>
    <h1>The script element</h1>
    <p id="demo">Hello JavaScript!</p>
  </body>
</html>

So how can I run it and get the HTML code evaluated with all scripts from a page?

EDIT 1: I know the library "Selenium", but I am trying to solve my problem without using browser simulators, just Python and JavaScript...

Thanks in advance!

5 Answers 5

0

You should use Brython then only you can combine HTML and Python language else you have to Learn framework like Django please let me know if it was helpful for you.

Sign up to request clarification or add additional context in comments.

Comments

0

You can use the library "selenium" to load the page, there are also elements included to click on website/ you can interact with it. to then scrape it, you might want to use the library "beautifulsoup4". you need to install both in cmd using

pip install selenium

and

pip install beautifulsoup4

1 Comment

Selenium is too slow, as it launches a browser... I am looking for a solution which uses only Python and JavaScript
0

Maybe you can get the response and write it to a html file

1 Comment

Yeah, but this is exactly my problem... I get a HTML file (or just string) and now I want to run its scripts as browsers do... But I don't know how from Python level only
0

You are looking for a cross-platform testing framework, named Selenium

You can launch pages using a webdriver in browsers like Firefox or Chrome, and interact with page like you are scripting.

Docs

2 Comments

Thanks, but... Selenium is pretty slow for me. As you said "it lauches pages in browsers" but I am trying to run scripts only using Python and JavaScript and no other apps. The performance matters for me and browsers slow down everything significantly...
I have already get a HTML file (as a string) and now I want to execute all its scripts as browsers do, but I don't know how...
0

This is partially answered by remove tags/stype-tags from html with lxml

It is unclear what exactly you need. From the example, you are just looking to remove <script ... /script> elements...

From The purpose is to get the same HTML code as in 'DOM Inspector' in web browser it seems you want to parse the HTML using python, ie. parse, display and show details of all the elements.

I personally prefer low level and lxml for html seems to do the trick...

Sample code:

from lxml import etree
from lxml import html
from lxml.html.clean import Cleaner

h = '''<!DOCTYPE html>
<html>
  <body>
    <h1>The script element</h1>
    <p id="demo"></p>

    <script>
      document.getElementById("demo").innerHTML = "Hello JavaScript!";
    </script>
  </body>
</html>'''

# Using etree 
root = etree.fromstring(h)
print(etree.tostring(root).decode())

# Code came from previous reference link above: https://stackoverflow.com/questions/8554035/remove-all-javascript-tags-and-style-tags-from-html-with-python-and-the-lxml-mod
cleaner = Cleaner()
cleaner.javascript = True # This is True because we want to activate the javascript filter
cleaner.style = True      # This is True because we want to activate the styles & stylesheet filter

# using html parser
print("\nWITH JAVASCRIPT & STYLES")
print(html.tostring(html.fromstring(h)).decode())
print("\nWITHOUT JAVASCRIPT & STYLES")
print(html.tostring(cleaner.clean_html(html.fromstring(h))).decode())

output:

<html>
  <body>
    <h1>The script element</h1>
    <p id="demo"/>

    <script>
      document.getElementById("demo").innerHTML = "Hello JavaScript!";
    </script>
  </body>
</html>

WITH JAVASCRIPT & STYLES
<html>
  <body>
    <h1>The script element</h1>
    <p id="demo"></p>

    <script>
      document.getElementById("demo").innerHTML = "Hello JavaScript!";
    </script>
  </body>
</html>

WITHOUT JAVASCRIPT & STYLES
<div>
  <body>
    <h1>The script element</h1>
    <p id="demo"></p>


  </body>
</div>
​

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.