6

I'm new to scrapy and would like to understand how to scrape on object for output into nested JSON. Right now, I'm producing JSON that looks like

[
{'a' : 1, 
'b' : '2',
'c' : 3},
]

And I'd like it more like this:

[
{ 'a' : '1',
'_junk' : [
     'b' : 2,
     'c' : 3]},
]

---where I put some stuff in _junk subfields to post-process later.

The current code under the parser definition file in my scrapername.py is...

item['a'] = x
item['b'] = y
item['c'] = z

And it seemed like

item['a'] = x
item['_junk']['b'] = y
item['_junk']['c'] = z

---might fix that, but I'm getting an error about the _junk key:

  File "/usr/local/lib/python2.7/dist-packages/scrapy/item.py", line 49, in __getitem__
    return self._values[key]
exceptions.KeyError: '_junk'

Does this mean I need to change my items.py somehow? Currently I have:

class Website(Item):
    a = Field()
    _junk = Field()
    b = Field()
    c = Field()

1 Answer 1

9

You need to create the junk dictionary before storing items in it.

item['a'] = x
item['_junk'] = {}
item['_junk']['b'] = y
item['_junk']['c'] = z
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.