0

JSON RESPONSE FROM WEBSITE I am new to python scrapy and json . I am trying to scrape json response from 78751 . But it is showing error . The code i used is

import scrapy
import json 
class BlackSpider(scrapy.Spider):
    name = 'black'
    start_urls = ['https://appworld.blackberry.com/cas/content/2360/reviews/2.17.2?page=1&pagesize=100&sortby=newest&callback=_content_2360_reviews_2_17_2&_=1499161778751']





    def parse(self, response):
         data = re.findall('(\{.+\})\);', response.body_as_unicode())
         a=json.loads(data[0])

         item = MyItem()
         item["Reviews"] = a["reviews"][4]["review"]           

         return item

The error it is showing is ValueError("No JSON object could be decoded")ERROR

2
  • It looks like your page is returning HTML content not JSON. Checking it via curl got me an html file that says the site is under maintenance. Commented Jul 4, 2017 at 11:58
  • It's working here let me add the screenshot of the site Commented Jul 4, 2017 at 11:59

1 Answer 1

1

The response you are getting is javascript function with some json in it:

_content_2360_reviews_2_17_2(\r\n{"some":"json"}]});\r\n

To extract the data from this you can use simple regex solution:

import re
import json
data = re.findall('(\{.+\})\);', response.body_as_unicode())
json.loads(data[0])

It trasnslates to: select everything between {} that ends with );

edit: results I'm getting with this:

{'platform': None,
 'reviews': [{'createdDate': '2017-07-04',
   'model': 'London',
   'nickname': 'aravind14-92362',
   'rating': 6,
   'review': 'Very bad ',
   'title': 'My WhatsApp no update '}],
 'totalReviews': 569909,
 'version': '2.17.2'}
Sign up to request clarification or add additional context in comments.

7 Comments

To add: the format is called "JSONP". It is was often used (before CORS became mainstream) to get around the same-origin policy.
I am new to it where to add the url of json response with regex.
@emon I'm sorry, could you rephrase your question?
why it is showing error when i put item["Reviews"] = a["reviews"][4]["review"]
@emon, I'm still confused. Check out my edit, I can succesfully get review via data['reviews'][0]['review'].
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.