1

I have been trying in many ways (and by many questions in stackoverflow) to normalize a deep json file. I have tried with .apply(pd.Series), not great with many levels of dictionary.

I am currently trying with json_normalize and it has given some results. I think I know how the function works and that my problem is that I don't know how to navigate through a dictionary.

So far, I have been able to dig into 2 levels.

import json
import pandas as pd
from pandas.io.json import json_normalize
raw = json.load(open('authors.json'))
raw2 = json_normalize(raw['hits']['hits'])

And it gives me what I need (at least the first levels). But I don't know how to go deeper.

I've tried:

raw2 = json_normalize(raw['hits']['hits'][0])
raw2 = json_normalize(raw['hits']['hits']['_source.authors'])
TypeError: string indices must be integers

And many more, but just randomly trying stuff without understanding is not the right way. I guess my questions are:

  • How do I know how to include the next level ({} vs [] in the json)?
  • Is there any visual way to represent this?

It is weird that this topic is not developed more online. Day by day I work more and more with json data.

_id _index  _score  _source.authors _source.deleted _source.description _source.doi _source.is_valid    _source.issue   _source.journal ... _source.rating_versatility_weighted _source.review_count    _source.tag _source.title   _source.userAvg _source.user_id _source.venue_name  _source.views_count _source.volume  _type   
0   7CB3F2AD    scibase_listings    1   None    0   None        1   None    Physical Review Letters ... 0   0   [mass spectra, elementary particles, bound sta...   Evidence for a new meson: A quasinuclear NN-ba...   0   None    Physical Review Letters 0   None    listing
1   7AF8EBC3    scibase_listings    1   [{'affiliations': ['Punjabi University'], 'aut...   0   None        1   None    Journal of Industrial Microbiology & Biotechno...   ... 0   0   [flow rate, operant conditioning, packed bed r...   Development of a stable continuous flow immobi...   0   None    Journal of Industrial Microbiology & Biotechno...   0   None    listing
2   7521A721    scibase_listings    1   [{'author_id': '7FF872BC', 'author_name': 'bar...   0   None        1   None    The American Historical Review  ... 0   0   [social movements]  Feminism and the women's movement : dynamics o...   0   None    The American Historical Review  0   None    listing

This is a chunk of the file (this is level 3, level 1 and 2 are, hits, hits).

{
"_shards": {
    "failed": 0,
    "successful": 5,
    "total": 5
},
"hits": {
    "hits": [{
            "_id": "7CB3F2AD",
            "_index": "scibase_listings",
            "_type": "listing",
            "_score": 1,
            "_source": {
                "userAvg": 0,
                "meta_keywords": null,
                "views_count": 0,
                "rating_reproducability": 0,
                "rating_versatility": 0,
                "rating_innovation": 0,
                "tag": null,
                "rating_reproducibility_weighted": 0,
                "meta_description": null,
                "review_count": 0,
                "rating_avg_weighted": 0,
                "venue_name": "The American Historical Review",
                "rating_num_weighted": 0,
                "is_valid": 1,
                "normalized_venue_name": "american historical review",
                "rating_clarity": 0,
                "description": null,
                "deleted": 0,
                "journal": "The American Historical Review",
                "volume": null,
                "link": null,
                "authors": [{
                        "author_id": "166468F4",
                        "author_name": "a bowdoin van riper"
                    },
                    {
                        "author_id": "81070854",
                        "author_name": "jeffrey h schwartz"
                    }
                ],
                "user_id": null,
                "pub_date": "1994-01-01 00:00:00",
                "pages": null,
                "doi": "",
                "issue": null,
                "rating_versatility_weighted": 0,
                "pubtype": null,
                "title": "Men Among the Mammoths: Victorian Science and the Discovery of Human Prehistory",
                "rating_clarity_weighted": 0,
                "rating_innovation_weighted": 0
            }
        },
        {
            "_index": "scibase_listings",
            "_type": "listing",
            "_id": "7538108B",
            "_score": 1,
            "_source": {
                "userAvg": 0,
                "meta_keywords": null,
                "views_count": 0,
                "rating_reproducability": 0,
                "rating_versatility": 0,
                "rating_innovation": 0,
                "tag": null,
                "rating_reproducibility_weighted": 0,
                "meta_description": null,
                "review_count": 0,
                "rating_avg_weighted": 0,
                "venue_name": "The American Historical Review",
                "rating_num_weighted": 0,
                "is_valid": 1,
                "normalized_venue_name": "american historical review",
                "rating_clarity": 0,
                "description": null,
                "deleted": 0,
                "journal": "The American Historical Review",
                "volume": null,
                "link": null,
                "authors": [{
                    "affiliations": [
                        "Pennsylvania State University"
                    ],
                    "author_id": "7E15BDFA",
                    "author_name": "roger l geiger"
                }],
                "user_id": null,
                "pub_date": "2013-06-01 00:00:00",
                "pages": null,
                "doi": "10.1093/ahr/118.3.896a",
                "issue": null,
                "rating_versatility_weighted": 0,
                "pubtype": null,
                "title": "Elizabeth Popp Berman. Creating the Market University: How Academic Science Became an Economic Engine.",
                "rating_clarity_weighted": 0,
                "rating_innovation_weighted": 0
            }
        }
    ]
}

}

3
  • would you mind to specify a valid JSON string/file, that could be parsed? Try to copy it from your question and pass it to json.loads(json_string) Another helpful resource is: jsonlint.com - it validates JSON files online Commented Nov 16, 2017 at 21:13
  • That one doesnt work? Is a chunk of the original one. Commented Nov 16, 2017 at 21:14
  • I have changed the invalid json to a valid one. Commented Feb 6, 2020 at 11:51

2 Answers 2

0

I guess I figured out how to 'dig' through a json. It will depend if the next level is a list or a dict.

In my case I was able to dig until the end with below. I still need to find out how to use the full list (maybe loop) so I can have all values and not just [0] or [1].

raw['hits']['hits'][1]['_source']['authors'][0]['affiliations']
Sign up to request clarification or add additional context in comments.

Comments

0

Can you try this :

json_normalize(raw['hits'],'hits','_source','authors','affiliations')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.