0

I am making a web crawler that exhausts every webpage. I am given the first link of http://www.someURL.com/42342. On this page is X number of lines of expressions. The parse and evaluate functions I have evaluate these expressions to numbers. With these numbers, I concatenate them to a default link (http://www.someURL.com/) to go to another link. I am trying to keep count how many webpages there are but I am currently running into this error:

Traceback (most recent call last):
  File "test2.py", line 72, in <module>
    print url_queue(convert_to_link(URL)) 
  File "test2.py", line 23, in url_queue
    new_urls = convert_to_link(url)
  File "test2.py", line 13, in convert_to_link
    num_list.append(evaluate(parse(expressions)))
  File "test2.py", line 62, in evaluate
    return stack[0]
IndexError: list index out of range

I'm not quite sure why. Each function seems to give the correct output. Could someone help point out where my logic is wrong in my code?

My code:

import urllib2

URL = 'http://www.someURL.com/42342'

def convert_to_link(url):      
    req = urllib2.Request(url)
    response = urllib2.urlopen(req)
    output_expressions = response.read().splitlines()   #return each expression in a list
    num_list = []
    url_list = []
    for expressions in output_expressions:
        num_list.append(evaluate(parse(expressions)))
    for number in num_list:
        url_list.append(newpage_gen(number))
    return url_list

def url_queue(url_list):
    count = 0
    for url in url_list:
        new_urls = convert_to_link(url)
        url_list.extend(new_urls)
        count += 1
    return count

def parse (s):          # parse expression
    s = s.replace('(', ' ').replace(')', ' ').replace(',', ' ')
    return s.split()[::-1]

def evaluate (ops):     # evaluate expression
    stack = []
    while ops:
        op = ops[0]
        ops = ops[1:]
        try:
            stack.append(int(op))
            continue
        except: pass
        if op == 'add':
            arg1, arg2 = stack.pop(), stack.pop()
            stack.append(arg1 + arg2)
            continue
        if op == 'multiply':
            arg1, arg2 = stack.pop(), stack.pop()
            stack.append(arg1 * arg2)
            continue
        if op == 'abs':
            arg1 = stack.pop()
            stack.append(abs(arg1))
            continue
        if op == 'subtract':
            arg1, arg2 = stack.pop(), stack.pop()
            stack.append(arg1 - arg2)
            continue
    return stack[0]

def newpage_gen(page_num):      # create new link
    url_template = 'http://www.someURL.com/'
    new_url = url_template + str(page_num)
    return new_url

print "TESTING"
print url_queue(convert_to_link(URL)) 
2
  • 1
    Can you post the expression that causes the problem? For example, the output of print ops when you enter the evaluate function? Commented Dec 1, 2013 at 4:10
  • @Liondancer Are you expecting those if statements to trigger every time? Is there ever a scenario where none of them should evaluate to true? Commented Dec 1, 2013 at 4:13

2 Answers 2

1

If stack is empty, stack[0] will give that error.

Stack will be empty if there are no ops, is a blank line.

I would wonder if your input file has a blank line or two at the end of it.

Sign up to request clarification or add additional context in comments.

2 Comments

hmm okay I will look into that
Well there are two other outputs that a link can display which are difference from the expressions. This is probably why the stack is empty. I forgot to take account of this thank you!
1

Like @Hugh Bothwell said, stack is probably empty. Try replacing your code with this to determine the error:

if op == 'add':
        arg1, arg2 = stack.pop(), stack.pop()
        stack.append(arg1 + arg2)
        continue
else if op == 'multiply':
        arg1, arg2 = stack.pop(), stack.pop()
        stack.append(arg1 * arg2)
        continue
else if op == 'abs':
        arg1 = stack.pop()
        stack.append(abs(arg1))
        continue
else if op == 'subtract':
        arg1, arg2 = stack.pop(), stack.pop()
        stack.append(arg1 - arg2)
        continue
else: 
        print "UNEXPECTED INPUT!!"+op

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.