0

I'm trying to create a simple module for phenny, a simple IRC bot framework in Python. The module is supposed to go to http://www.isup.me/websitetheuserrequested to check is a website was up or down. I assumed I could use regex for the module seeing as other built-in modules use it too, so I tried creating this simple script although I don't think I did it right.

import re, urllib
import web

isupuri = 'http://www.isup.me/%s'
check = re.compile(r'(?ims)<span class="body">.*?</span>')

def isup(phenny, input):
    global isupuri
    global cleanup

    bytes = web.get(isupuri)
    quote = check.findall(bytes)
    result = re.sub(r'<[^>]*?>', '', str(quote[0]))
    phenny.say(result)

isup.commands = ['isup']
isup.priority = 'low'
isup.example = '.isup google.com'

It imports the required web packages (I think), and defines the string and the text to look for within the page. I really don't know what I did in those four lines, I kinda just ripped the code off another phenny module.

Here is an example of a quotes module that grabs a random quote from some webpage, I kinda tried to use that as a base: http://pastebin.com/vs5ypHZy

Does anyone know what I am doing wrong? If something needs clarified I can tell you, I don't think I explained this enough.

Here is the error I get:

Traceback (most recent call last):
  File "C:\phenny\bot.py", line 189, in call
    try: func(phenny, input)
  File "C:\phenny\modules\isup.py", line 18, in isup
    result = re.sub(r'<[^>]*?>', '', str(quote[0]))
IndexError: list index out of range
4
  • what exactly isn't working for you? the program does not run? the result is wrong? Commented Jan 3, 2012 at 15:12
  • also, why do you need isup.me? why don't you do a HTTP HEAD request to check if the site is up? Commented Jan 3, 2012 at 15:12
  • I added the error that I get when the command is executed. And I never knew I could use HTTP HEAD, even though I'm not sure what it is. Commented Jan 3, 2012 at 15:15
  • You don't need the global statements, so long as you're not defining them within the function. I'd also recommend that you capitalize your static variables (e.g., ISUPURI instead of isupuri), so people (and you) know not to mess with them. Commented Jan 3, 2012 at 15:30

2 Answers 2

1

try this (from http://docs.python.org/release/2.6.7/library/httplib.html#examples):

import httplib
conn = httplib.HTTPConnection("www.python.org")
conn.request("HEAD","/index.html")
res = conn.getresponse()
if res.status >= 200 and res.status < 300:
    print "up"
else:
    print "down"

You will also need to add code to follow redirects before checking the response status.

edit

Alternative that does not need to handle redirects but uses exceptions for logic:

import urllib2
request = urllib2.Request('http://google.com')
request.get_method = lambda : 'HEAD'

try:
    response = urllib2.urlopen(request)
    print "up"
    print response.code
except urllib2.URLError, e:
    # failure
    print "down"
    print e

You should do your own tests and choose the best one.

Sign up to request clarification or add additional context in comments.

6 Comments

This kinda works, I edited the "www.python.org" to whatever the user said, although now it says everything is down, I think because of the /index.html as some sites may not have this. How would I go about just checking the final page it redirects to?
@Alex: Use the exact same URL which your browser uses (just copy it from the location bar).
Whenever I include http I get this error: InvalidURL: nonnumeric port:'//stackoverflow.com/questions/8714093/how-do-i-search-for-text-in-a-page-using-regular-expressions-in-python' (source unknown)
I tried the new edit, and it works... however when I don't include http:// it throws an error, I was going to add http:// to the user's query however if they already had http:// in their query it would cause another error... and also if the website is down it doesn't say anything at all.
I assumed you need to check for http availability hence the need for the http prefix. You should do more tests to see if you should also check response.code value to be >= 200 and < 300
|
0

The error means your regexp wasn't found anywhere on the page (the list quote has no element 0).

2 Comments

I thought r'(?ims)<span class="body">.*?</span>' would be valid regex, seeing as the result is found inside that HTML tag...
It's valid (or you would have gotten an error compiling it). It just doesn't match anywhere on the page. That can mean the page is an empty string (nothing was downloaded or you got an error page) or that the regexp doesn't do what you think it should.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.