IndexError: list index out of range(Python) [duplicate]

Question

Please find my code :

# # Parse a new post. #

  def parse_new_post(self,response,review,created_at,data):
    data.update({
      'cool_count':self.set_int(review.css('a[rel=cool]').css('span[class=count]::text').extract()),
      'created_at':self.set_date(review.css('meta[itemprop=datePublished]::attr(content)').extract()[0]),
      'elite':len(review.css('.is-elite')) == 1,
      'funny_count':self.set_int(review.css('a[rel=funny]').css('span[class=count]::text').extract()),
      'owner_comment_text':self.set_text(review.css('span[class=js-content-toggleable\ hidden]::text').extract()).replace("\n"," "),
      #'rating':review.css('div[itemprop=reviewRating]').css('div').css('i::attr(title)').re('(\\d\.\\d)'),
      'rating':review.css('div[itemprop=reviewRating]').css('meta').css('::attr(content)').re('(\\d\.\\d)')[0].encode('utf-8'),
        #'review_id':review.css('div::attr(data-review-id)').extract()[0].encode('utf-8'),
      #'review_id':review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract(),
     'review_text':self.set_text(review.css('p[itemprop=description]::text').extract()).replace("\n"," "),
      'total_friends':self.set_int(review.css('li[class=friend-count]').css('b::text').extract()),
      #'total_friends':int(review.xpath('.//li[contains(@class,"friend-count")]/span/b/text()').extract()[0].strip()),
      #'total_reviews':int(review.xpath('.//li[contains(@class,"review-count")]/span/b/text()').extract()[0].strip()),
        #'total_friends':int(review.xpath('.//li[contains(@class,"friend-count")]/b/text()').extract()[0].strip()),
        #'total_reviews':int(review.xpath('.//li[contains(@class,"review-count")]/b/text()').extract()[0].strip()),
        'total_reviews':self.set_int(review.css('li[class=review-count]').css('b::text').extract()),
      'user_id':review.css('div[class*=photo-box]').css('a::attr(href)').extract(),
      'useful_count':self.set_int(review.css('a[rel=useful]').css('span[class=count]::text').extract()),
      #'user_location':review.css('li[class=user-location]').css('b::text').extract()[0].encode('utf-8'),
      'user_location':review.xpath('.//li[@class="user-location responsive-hidden-small"]/b/text()').extract(),
      'username':review.css('meta[itemprop=author]::attr(content)').extract()[0].encode('utf-8'),
        'review_id':review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract()[0].encode('utf-8'),
    })

When I am crawling the website I am getting error below:

2016-11-22 01:27:52 [scrapy] ERROR: Spider error processing <GET https://www.yelp.com/biz/lexus-of-glendale-glendale?utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=HPtU-ro8MXX3MOY_DQkP6A?sort_by=date_desc> (referer: None)
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 28, in process_spider_output
    for x in result:
  File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 54, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/c360/apps/c360nextgen/src/crawlers/yelp_new/yelp_new/spiders/lexus_posts.py", line 85, in parse
    yield self.check_for_new_post(response,review,created_at,data)
  File "/c360/apps/c360nextgen/src/crawlers/yelp_new/yelp_new/spiders/lexus_posts.py", line 95, in check_for_new_post
    return self.parse_new_post(response,review,created_at,data)
  File "/c360/apps/c360nextgen/src/crawlers/yelp_new/yelp_new/spiders/lexus_posts.py", line 123, in parse_new_post
    'review_id':review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract()[0].encode('utf-8'),
IndexError: list index out of range

Hi! welcome! I think you should provide code of your application where the crash happens, in that case community will try to help you — wolendranh
– wolendranh, Commented Nov 22, 2016 at 9:24
review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract() is an empty list — Chr
– Chr, Commented Nov 22, 2016 at 9:24
While i m fetching it, its returning value.. Can u please help me with the code? — Varun rai
– Varun rai, Commented Nov 22, 2016 at 9:35

Sven · Accepted Answer · 2016-11-22 09:27:09Z

0

Your query returns an empty list. Therefore it can't find the first element [0] and throws an IndexError. Fix the code you use to crawl the website.

answered Nov 22, 2016 at 9:27

Sven

2,8997 gold badges33 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Varun rai Over a year ago

can you please help me where i should fix ?

Sven Over a year ago

I assume something in review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract() is wrong, but you need to check your xpath yourself.

Varun rai Over a year ago

While i m using (.//div[contains(@class,"review review--with-sidebar")]/@data-review-i) , i am getting data in a list.

Sebastian Nielsen · Accepted Answer · 2016-11-22 09:33:18Z

0

the indexerror: list out of range, simply means that you are trying to call an index/item in the list that doesnt exist.

Here is an example:

list = [1, 2]
print(list[4])

Notice that there isnt a 5. item in the list therefor it will result in a: IndexError: list index out of range(Python)

In your ocation, it returns en empty list, which you try to find the first element of [0] (0, is the first item, but there is none items in the list)

edited Nov 22, 2016 at 9:33

answered Nov 22, 2016 at 9:28

Sebastian Nielsen

4,2695 gold badges37 silver badges56 bronze badges

7 Comments

Varun rai Over a year ago

U are fetching 4 as a value or 4 as a index?

Sebastian Nielsen Over a year ago

4 as an index. list[4] means the 5. item in the list, not the number 4 If it helped please upvote :)

Varun rai Over a year ago

Thanks @Seastian.. can u help me , what should i change in my code?

Sebastian Nielsen Over a year ago

You havent shown all of your code, and the part of your code you do show is kinda messy. Its hard to help you with these conditions

Sebastian Nielsen Over a year ago

Maybe you can provide me with the website, maybe I could check the xpath . And check if its correct

|

Chr · Accepted Answer · 2016-11-22 09:38:57Z

0

Add a check on review list :

review_list = review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract()
if review_list:
    'review_id' : review_list[0].encode('utf-8')
else:
    'review_id' : ""

answered Nov 22, 2016 at 9:38

Chr

9651 gold badge10 silver badges27 bronze badges

Collectives™ on Stack Overflow

IndexError: list index out of range(Python) [duplicate]

3 Answers 3

3 Comments

7 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

7 Comments

Comments

Linked

Related