-6

Please find my code :

# # Parse a new post. #

  def parse_new_post(self,response,review,created_at,data):
    data.update({
      'cool_count':self.set_int(review.css('a[rel=cool]').css('span[class=count]::text').extract()),
      'created_at':self.set_date(review.css('meta[itemprop=datePublished]::attr(content)').extract()[0]),
      'elite':len(review.css('.is-elite')) == 1,
      'funny_count':self.set_int(review.css('a[rel=funny]').css('span[class=count]::text').extract()),
      'owner_comment_text':self.set_text(review.css('span[class=js-content-toggleable\ hidden]::text').extract()).replace("\n"," "),
      #'rating':review.css('div[itemprop=reviewRating]').css('div').css('i::attr(title)').re('(\\d\.\\d)'),
      'rating':review.css('div[itemprop=reviewRating]').css('meta').css('::attr(content)').re('(\\d\.\\d)')[0].encode('utf-8'),
        #'review_id':review.css('div::attr(data-review-id)').extract()[0].encode('utf-8'),
      #'review_id':review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract(),
     'review_text':self.set_text(review.css('p[itemprop=description]::text').extract()).replace("\n"," "),
      'total_friends':self.set_int(review.css('li[class=friend-count]').css('b::text').extract()),
      #'total_friends':int(review.xpath('.//li[contains(@class,"friend-count")]/span/b/text()').extract()[0].strip()),
      #'total_reviews':int(review.xpath('.//li[contains(@class,"review-count")]/span/b/text()').extract()[0].strip()),
        #'total_friends':int(review.xpath('.//li[contains(@class,"friend-count")]/b/text()').extract()[0].strip()),
        #'total_reviews':int(review.xpath('.//li[contains(@class,"review-count")]/b/text()').extract()[0].strip()),
        'total_reviews':self.set_int(review.css('li[class=review-count]').css('b::text').extract()),
      'user_id':review.css('div[class*=photo-box]').css('a::attr(href)').extract(),
      'useful_count':self.set_int(review.css('a[rel=useful]').css('span[class=count]::text').extract()),
      #'user_location':review.css('li[class=user-location]').css('b::text').extract()[0].encode('utf-8'),
      'user_location':review.xpath('.//li[@class="user-location responsive-hidden-small"]/b/text()').extract(),
      'username':review.css('meta[itemprop=author]::attr(content)').extract()[0].encode('utf-8'),
        'review_id':review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract()[0].encode('utf-8'),
    })

When I am crawling the website I am getting error below:

2016-11-22 01:27:52 [scrapy] ERROR: Spider error processing <GET https://www.yelp.com/biz/lexus-of-glendale-glendale?utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=HPtU-ro8MXX3MOY_DQkP6A?sort_by=date_desc> (referer: None)
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 28, in process_spider_output
    for x in result:
  File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 54, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/c360/apps/c360nextgen/src/crawlers/yelp_new/yelp_new/spiders/lexus_posts.py", line 85, in parse
    yield self.check_for_new_post(response,review,created_at,data)
  File "/c360/apps/c360nextgen/src/crawlers/yelp_new/yelp_new/spiders/lexus_posts.py", line 95, in check_for_new_post
    return self.parse_new_post(response,review,created_at,data)
  File "/c360/apps/c360nextgen/src/crawlers/yelp_new/yelp_new/spiders/lexus_posts.py", line 123, in parse_new_post
    'review_id':review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract()[0].encode('utf-8'),
IndexError: list index out of range
4
  • 1
    Hi! welcome! I think you should provide code of your application where the crash happens, in that case community will try to help you Commented Nov 22, 2016 at 9:24
  • review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract() is an empty list Commented Nov 22, 2016 at 9:24
  • Please format this as code so we can read it. Commented Nov 22, 2016 at 9:25
  • While i m fetching it, its returning value.. Can u please help me with the code? Commented Nov 22, 2016 at 9:35

3 Answers 3

0

Your query returns an empty list. Therefore it can't find the first element [0] and throws an IndexError. Fix the code you use to crawl the website.

Sign up to request clarification or add additional context in comments.

3 Comments

can you please help me where i should fix ?
I assume something in review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract() is wrong, but you need to check your xpath yourself.
While i m using (.//div[contains(@class,"review review--with-sidebar")]/@data-review-i) , i am getting data in a list.
0

the indexerror: list out of range, simply means that you are trying to call an index/item in the list that doesnt exist.

Here is an example:

list = [1, 2]
print(list[4])

Notice that there isnt a 5. item in the list therefor it will result in a: IndexError: list index out of range(Python)

In your ocation, it returns en empty list, which you try to find the first element of [0] (0, is the first item, but there is none items in the list)

7 Comments

U are fetching 4 as a value or 4 as a index?
4 as an index. list[4] means the 5. item in the list, not the number 4 If it helped please upvote :)
Thanks @Seastian.. can u help me , what should i change in my code?
You havent shown all of your code, and the part of your code you do show is kinda messy. Its hard to help you with these conditions
Maybe you can provide me with the website, maybe I could check the xpath . And check if its correct
|
0

Add a check on review list :

review_list = review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract()
if review_list:
    'review_id' : review_list[0].encode('utf-8')
else:
    'review_id' : ""

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.