10

I have Debian Linux server that I use for a variety of things. I want it to be able to do some web-scraping jobs I need done regularly.

This code can be found here.

import sys  
from PyQt4.QtGui import *  
from PyQt4.QtCore import *  
from PyQt4.QtWebKit import *  
  
class Render(QWebPage):  
  def __init__(self, url):  
    self.app = QApplication(sys.argv, False)  # Line updated based on mata's answer
    QWebPage.__init__(self)  
    self.loadFinished.connect(self._loadFinished)  
    self.mainFrame().load(QUrl(url))  
    self.app.exec_()  
  
  def _loadFinished(self, result):  
    self.frame = self.mainFrame()  
    self.app.quit()  
  

A simple test of it would look like this:

url = 'http://example.com'
print Render(url).frame.toHtml()

On the call to the constructor it dies with this message (it's printed to stdout, not an uncaught exception).

: cannot connect to X server 

How can I use Python (2.7), QT4, and Webkit on a headless server? Nothing ever needs to be displayed, so I can tweek any settings or anything that need to be tweeked.

I've looked into alternatives, but this is the best fit for me and my projects. If I did have to install an X server, how could I do it with minimal overhead?

1
  • Can you avoid including QtGui? Commented Nov 4, 2012 at 1:27

5 Answers 5

22

One of the constructors of QApplication takes a boolean argument GUIenabled.
If you use that, you can instantiante QAppliaction without an X server, but you can't create QWidgets.

So in this case the only option is to use a virtual X server like Xvfb to render the GUI.

Xvfb can be installed and run using these commands (assuming you have apt-get installed). The code in the original question is in a file called render.py.

sudo apt-get install xvfb
xvfb-run python render.py
Sign up to request clarification or add additional context in comments.

8 Comments

It gave me "QWidget: Cannot create a QWidget when no GUI is being used". Do you have an idea how to fix it? I'll check out Xvfb, just in case.
Sorry, I didn't really check it and I somehow seemed to remember that you just can't show widgets in headless mode but instantiate them. So if you need to use Qt, you'll have to go with Xvfb.
xvfb works great! I was worried I'd have to install all of X11, and have a server running. Thanks! I updated your answer with what worked for me.
@mata Where did you read that the constructor for QApplication takes an argument GUIenabled? I can't find anything about that.
@GreySage - I've updated the link. Note that this is only valid for PyQt4, on PyQt5 that argument is not supported anymore, probably because it doesn't make a lot of sense in the first place. Better to use QCoreApplication instead.
|
6

On gitlab CI/CD. Adding ['-platform', 'minimal'] and using xvfb didn't work for me. Instead I use QT_QPA_PLATFORM: "offscreen" variable.

See https://stackoverflow.com/a/55442821/6000005

1 Comment

This worked for me and seems to be the current solution. I have used xvfb in the past (years), which doesn't do it any more, but setting the platform target as described here does.
5

If PyQt5 is an option, Qt 5 has the "minimal" platform plugin.

To use it, modify the argv passed to QApplication to include ['-platform', 'minimal'].

Comments

1

If all you are trying to do is get the webpage, you could use

import urllib
urllib.urlopen('http://example.com').read()

6 Comments

Good general answer, but I like to have the JavaScript. Thanks.
Yes. HTML, CSS, JavaScript, images, etc. It's exactly like going to the site in Chrome or Safari (they both use WebKit).
It seems I may have misunderstood what you were trying to do. Are you wanting to actually display the webpage? Your example led me to believe that you only wanted the HTML.
Python WebKit lets you do querys on the page (CSS2-like selectors), execute JavaScript, etc. You could do what I want with the HTML and BeuatifulSoup but I like the completeness.
The main limiter for BeautifulSoup is the fact that it ignores JavaScript, which is why the OP was lead to webkit, just like me I'm sure.
|
1

phantomjs is a webkit based solution. runs headless as well. try it out.

If you are keen on using webkit yourself you could also try the pyslide version of qt.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.