0

I am trying to scrape data from this link https://www.seloger.com/ and I get this error, I don't understand what's wrong because I already tried this code before and it worked

import re
import requests
import csv
import json


with open("selog.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["id", "Type", "Prix", "Code_postal", "Ville", "Departement", "Nombre_pieces", "Nbr_chambres", "Type_cuisine", "Surface"]) 


for i in range(1, 500):
   url = str('https://www.seloger.com/list.htm?tri=initial&idtypebien=1,2&pxMax=3000000&div=2238&idtt=2,5&naturebien=1,2,4&LISTING-LISTpg=' + str(i))
   r = requests.get(url, headers = {'User-Agent' : 'Mozilla/5.0'})
   p = re.compile('var ava_data =(.*);\r\n\s+ava_data\.logged = logged;', re.DOTALL)
   x = p.findall(r.text)[0].strip().replace('\r\n    ','').replace('\xa0',' ').replace('\\','\\\\')
   x = re.sub(r'\s{2,}|\\r\\n', '', x)
   data = json.loads(x)
   f = csv.writer(open("Seloger.csv", "wb+"))


   for product in data['products']:
      ID = product['idannonce']
      prix = product['prix']
      surface = product['surface']
      code_postal = product['codepostal']
      nombre_pieces = product['nb_pieces']
      nbr_chambres = product['nb_chambres']
      Type = product['typedebien']
      type_cuisine = product['idtypecuisine']
      ville = product['ville']
      departement = product['departement']
      etage = product['etage']
      writer.writerow([ID, Type, prix, code_postal, ville, departement, nombre_pieces, nbr_chambres, type_cuisine, surface])

this the error :

Traceback (most recent call last):
File "Seloger.py", line 20, in <module>
x = p.findall(r.text)[0].strip().replace('\r\n    ','').replace('\xa0',' ').replace('\\','\\\\')
IndexError: list index out of range
3
  • 1
    list index out of range means that something wrong with index [0] so check first what you have in print( p.findall(r.text) ) Commented May 16, 2019 at 11:15
  • 1
    if you get empty list for p.findall(r.text) then you could check r.text - you can save it in file and open in web browser - maybe there is some useful information or warning for bots/scripts or captch. Commented May 16, 2019 at 11:28
  • 1
    I run code and sometimes I get page with text "Oops, une erreur technique est survenue. Merci de ressayer ultérieurement." which means "oops, a technical error has occurred. please try again later." and then findall()` returns empty list - so it has no index [1] and code shows error list index out of range Commented May 16, 2019 at 11:57

2 Answers 2

1

This line is wrong:

x = p.findall(r.text)[0].strip().replace('\r\n    ','').replace('\xa0',' ').replace('\\','\\\\')

what you need to find in text?

for working scraped on text you need change above line to:

x = r.text.strip().replace('\r\n    ','').replace('\xa0',' ').replace('\\','\\\\')

and then finding something you need

Sign up to request clarification or add additional context in comments.

1 Comment

problem is that sometimes page shows message "Oops, une erreur technique est survenue. Merci de ressayer ultérieurement."which means "Oops, a technical error has occurred. please try again later." and then findall() can't find expected text.
0

The error occurs because sometimes there is no match, and you are trying to access a non-existing item in an empty list. The same result can be reproduced with print(re.findall("s", "d")[0]).

To fix the issue, replace x = p.findall(r.text)[0].strip().replace('\r\n ','').replace('\xa0',' ').replace('\\','\\\\') line with

x = ''
xm = p.search(r.text)
if xm:
    x = xm.group(1).strip().replace('\r\n    ','').replace('\xa0',' ').replace('\\','\\\\')

NOTES

  • When you use p.findall(r.text)[0], you want to get the first match in the input, so re.search is best here as it only returns the first match
  • To obtain the substirng captured in the first capturing group, you need to use matchObject.grou[p(1)
  • if xm: is important: if there is no match, x will remain an empty string, else, it will be assigned the modified value in Group 1.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.