1

I am currently building a web scraper for Real Estate data. I'm working in Python and I've come across an error I can't seem to be able to fix.

for i in range(len(s)):
                        if '$' in s[i]:
                                price.append(s[i])

                        elif 'bath' in s[i]:
                                left = s[i].partition(",")[0]
                                right = s[i].partition(",")[2]
                                bed_bath.append(left)
                                sqft_lot.append(right)

                        elif 'fort collins' in s[i].lower():
                                address0 = s[i-1]+' '+s[i]
                                address.append(address0)

                        elif s[i].lower() == 'advertisement':
                                del s[i]

                        else:
                                continue

Value of 's' being:

                display = Display(visible=0, size=(800, 600))
                display.start()
                browser = webdriver.Firefox()
                browser.get(realtor.format(format))
                p = browser.find_element(By.XPATH, "//ul[@class='jsx-343105667 property-list list-unstyle']")
                content = p.text
                s = re.split('\n',content)

This is basically supposed to iterate through the array s, and add them to a separate array [price,bed_bath,sqrft_lot,address] to be used in a DataFrame. I know that it is indexing properly, I've printed each line consecutively using for i in range(len(s)): print s[i], which works, but then when I try to implement logic it's just breaking.

Getting error:

if '$' in s[i]:
**IndexError: list index out of range**

Any input into why this is happening would be much appreciated.

6
  • Can you add an example of s which didn't work for you? Commented Feb 19, 2022 at 17:33
  • 3
    You seem to be removing elements with: del s[i]. Surely this affects the length of s and might mean that you run i off the end. Commented Feb 19, 2022 at 17:35
  • Did you mean to collect the offending indexes and remove them once this loop has finished? Commented Feb 19, 2022 at 17:39
  • Added the code declaring the 's' variable. Let me take a look but I believe @quamrana got it. What I might do instead is use a separate for loop to take care of 'advertisement' entries. Commented Feb 19, 2022 at 17:39
  • Ideally you would add a clear example of s as a python list, and not a code generating one, as we can't run that code. Commented Feb 19, 2022 at 17:42

3 Answers 3

2

As @quamrana mentioned, most likely the problem is that you do del s[i], so s get's shorter and thus some indexes will no longer exist in s. I have 2 possible fix ideas. Fix 1:

for i in range(len(s)):
    if i >= len(s): # check if index is still in bounds
        break
    
    if '$' in s[i]:
            price.append(s[i])

    elif 'bath' in s[i]:
            left = s[i].partition(",")[0]
            right = s[i].partition(",")[2]
            bed_bath.append(left)
            sqft_lot.append(right)

    elif 'fort collins' in s[i].lower():
            address0 = s[i-1]+' '+s[i]
            address.append(address0)

    elif s[i].lower() == 'advertisement':
            del s[i]
    else:
            continue

Fix 2:

indexes_to_remove = []

for i in range(len(s)):
    if '$' in s[i]:
            price.append(s[i])

    elif 'bath' in s[i]:
            left = s[i].partition(",")[0]
            right = s[i].partition(",")[2]
            bed_bath.append(left)
            sqft_lot.append(right)

    elif 'fort collins' in s[i].lower():
            address0 = s[i-1]+' '+s[i]
            address.append(address0)

    elif s[i].lower() == 'advertisement':
            indexes_to_remove.append(i)
    else:
            continue


for index in indexes_to_remove[::-1]: # if you iterate through it backward, you won't have that problem.
    del s[i]
Sign up to request clarification or add additional context in comments.

1 Comment

Second answer here would be the way I would do it if it were necessary. Appreciate the input.
0

create a new list from s and populate it with filtered and processed data

output_list = []
def process_data(value):
    # your code for processing data
    ...

for i in range(len(s)):
    if s[i] == some_condition(i):
         output_list.append(process_value(s[i])

Comments

0

You're deleting inside the for loop. The example here throws an error as well and is maybe easier to understand:

s = [i for i in range(5)]

for i in range(len(s)):
    print(f"{i=} with {s=}")
    del s[i]

Output:

IndexError: list assignment index out of range
i=0 with s=[0, 1, 2, 3, 4]
i=1 with s=[1, 2, 3, 4]
i=2 with s=[1, 3, 4]
i=3 with s=[1, 3]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.