0

The records in the JSON file look like this (please note what "nutrients" looks like):

{
"id": 21441,
"description": "KENTUCKY FRIED CHICKEN, Fried Chicken, EXTRA CRISPY,
Wing, meat and skin with breading",
"tags": ["KFC"],
"manufacturer": "Kentucky Fried Chicken",
"group": "Fast Foods",
"portions": [
{
"amount": 1,
"unit": "wing, with skin",
"grams": 68.0
},
...
],
"nutrients": [
{
"value": 20.8,
"units": "g",
"description": "Protein",
"group": "Composition"
},
{'description': 'Total lipid (fat)',
'group': 'Composition',
'units': 'g',
'value': 29.2}
...
]
}

The following is the code from the book exercise*. It includes some wrangling and assembles the nutrients for each food into a single large table:

import pandas as pd
import json

db = pd.read_json("foods-2011-10-03.json")

nutrients = []

for rec in db:
     fnuts = pd.DataFrame(rec["nutrients"])
     fnuts["id"] = rec["id"]
     nutrients.append(fnuts)

However, I get the following error and I can't figure out why:


TypeError                                 Traceback (most recent call last)
<ipython-input-23-ac63a09efd73> in <module>()
      1 for rec in db:
----> 2     fnuts = pd.DataFrame(rec["nutrients"])
      3     fnuts["id"] = rec["id"]
      4     nutrients.append(fnuts)
      5

TypeError: string indices must be integers

*This is an example from the book Python for Data Analysis

2
  • Your JSON is not valid (and even when one corrects the quotes and removes the dots, it cannot be loaded by pd.read_json). Please submit data we can actually see your problem on. Commented Aug 30, 2017 at 9:52
  • @Amadan, here is the link to the data: github.com/wesm/pydata-book/blob/master/ch07/… Commented Aug 30, 2017 at 9:56

3 Answers 3

1

for rec in db iterates over column names. To iterate over rows,

for id, rec in db.iterrows():
    fnuts = pd.DataFrame(rec["nutrients"])
    fnuts["id"] = rec["id"]
    nutrients.append(fnuts)

This is a bit slow though (all the dicts that need constructing). itertuples is faster; but since you only care about two series, iterating over series directly is probably fastest:

for id, value in zip(db['id'], db['nutrients']):
    fnuts = pd.DataFrame(value)
    fnuts["id"] = id
    nutrients.append(fnuts)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, that works fine! Have there been changes in how this iteration works since the book was written or should this be added to book's errata?
Sorry, I don't know too much about Pandas history, and I haven't read the book.
0

The code works perfectly fine but the json should look something like this for code to work:

[{
"id": 21441,
"description": "KENTUCKY FRIED CHICKEN, Fried Chicken, EXTRA CRISPY,Wing, meat and skin with breading",
"tags": ["KFC"],
"manufacturer": "Kentucky Fried Chicken",
"group": "Fast Foods",
"portions": [
{"amount": 1,
"unit": "wing, with skin",
"grams": 68.0}],
"nutrients": [{
"value": 20.8,
"units": "g",
"description": "Protein",
"group": "Composition"
},
{'description': 'Total lipid (fat)',
'group': 'Composition',
'units': 'g',
'value': 29.2}]}]

This is example with one record only.

Comments

0

Amadan answered the question, but I managed to solve it like this prior to seeing his answer:

for i in range(len(db)):
    rec = db.loc[i]
    fnuts = pd.DataFrame(rec["nutrients"])
    fnuts["id"] = rec["id"]
    nutrients.append(fnuts)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.