0

I have a nested JSON (API) webstie which i want to parse and save items to file (using Scrapy framework).

I want to access each subelement of given elements, those are in following format

0   {…}
1   {…}
2   {…}
3   {…}
4   {…}
5   {…}
6   {…}
7   {…}
8   {…}
9   {…}
10  {…}

If I expand element 0 i get following values, where {...} exapnds further

id  6738
date    "2018-06-14T09:38:51"
date_gmt    "2018-06-14T09:38:51"
guid    
     rendered   "https:example.com"
modified    "2019-03-19T20:43:50"
modified_gmt    "2019-03-19T20:43:50"

How does it look like in reality

How do I access, consecutively, each element, first 0, then 1, then 2 ... up to total of 350 and grab value of, for example

guid   
    rendered "https//:example.com"

and save it to item.

What I have:

       results = json.loads(response.body_as_unicode())
       item = DataItem()
       for var in results:
           item['guid'] = results["guid"]
       yield item

This fails with

TypeError: list indices must be integers, not str

I know that i can access it with

item['guid'] = results[0]["guid"]

But this only gives me [0] index of the whole list and I want to iterate through all of indexes. How do I pass index number inside of the list?

2
  • But this only gives me [0] index of the whole list how about replacing 0 with something uhm, like a variable? or the length? Commented Mar 21, 2019 at 14:33
  • Post a sample of results to get instant help. Commented Mar 21, 2019 at 14:39

1 Answer 1

1

Replace results["guid"] in your for loop to var["guid"]:

for var in results:
    item['guid'] = var["guid"]
    # do whatever you want with item['guid'] here

when you can access guid like results[0]["guid"] it means that you have list of dictionaries that every dictionary contains key named guid. in your for loop you use results (that is list) instead of var (that contain every dictionary in each iteration) that throws TypeError because list indices must be integers not strings (like "guid").

UPDATE: if you want to save each var["guid"] you can save them in a dictionary like this:

guid_holder = {"guid": []}
for var in results:
    guid_golder["guid].append(var["guid"])
for guid in guid_holder["guid"]:
    print(guid)

now guid_holder holds all elements.

Sign up to request clarification or add additional context in comments.

7 Comments

I've done this. It behaves weirdly. Gives only one results from 10th index. results[0]["guid'] behaves correctly, prints guid for element [0]. results keeps whole json webpage in variable, i can print it too by using print(results). I don't know how to iterate through every [0,1,2,3...] and get guid for each.
@Alex16237 What exactly results contains ? please add it as an example to your question
I've posted a picture (edited post). Can't get formatting right with this one as there are too many elements.
@Alex16237 I updated my answer see section UPDATE for saving all elements
Unfortunately, it doesn't work. Maybe i phrase it badly. How do I pass variable/lenght of an array to index inside of the loop? I think this is how I solve this problem, ie. item['guid'] = results[*]["guid"] where * is variable passed by a loop. results reads whole page, if I call it with print i get ful JSON parsed page.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.