Scrapy parsing JSON output

Question

I am using Scrapy to crawl a website. Some pages use AJAX so I got the AJAX requests to get the actual data. so far so good. The output of those AJAX requests is JSON outputs. Now I would like to parse the JSON but Scrapy just provides HtmlXPathSelector. Has anybody transformed successfully a JSON output into HTML and been able to parse it with HtmlXPathSelector?

Thank you very much in advance

You do not want to convert JSON to HTML. Can you give us a sample of the JSON response. — Steven Almeroth
– Steven Almeroth, Commented Apr 9, 2013 at 22:06

Ionut Hulub · Accepted Answer · 2013-04-09 19:20:21Z

5

import json

response = json.loads(jsonResponse)

The code above will decode the json you receive. Afterwards, you should be able to process it any way you want.

(Replace jsonResponse with the json that you get from the ajax request)

answered Apr 9, 2013 at 19:20

Ionut Hulub

6,3365 gold badges29 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sravan · Accepted Answer · 2014-10-31 07:26:57Z

Slightly complicated, still works.

If you're interested in working with xpaths on JSON outputs..

Disclaimer : May not be the optimal soln. +1 if someone improves this approach.

install dicttoxml package (pip recommended)

-Download the output using scrapy's traditional Request module

in spider:

from scrapy.selector import XmlXPathSelector
import lxml.etree as etree

request = Request(link, callback=self.parse_resp)
yield request

def parse_resp(self,response):
     json=response.body
     #Now load the contents using python's JSON module
     json_dict = json.loads(json)
     #transform the contents into xml using dicttoxml
     xml = dicttoxml.dicttoxml(json_dict)
     xml = etree.fromstring(xml)
     #Apply scrapy's XmlXPathSelector module,and start using xpaths
     xml = XmlXPathSelector(text=xml)
     data = xml.select(".//*[@id='count']/text()").extract()
     return data

I did this because, i'm maintaining all the xpaths of all the spiders in one place (config-files)

Collectives™ on Stack Overflow

Scrapy parsing JSON output

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related