2

I have a json with nested objects (nested list of objects): json:

{
    "uniqueIdentifier": {
        "identity": {
            "textIdentifier": "MysticFoliage",
            "encodedIdentifier": "<p>MysticFoliage</p>"
        }
    },
    "languagePreference": {
        "chosenLanguage": {
            "displayText": "Enigmatic Tongue",
            "languageCode": "<p>Enigmatic Tongue</p>"
        }
    },
    "specialCode": 42,
    "creationTimestamp": "1630411200000",
    "creatorAlias": "whisperingShadow",
    "unusualStatus": "UNEXPLORED",
    "identifierCode": "XyzAbc123",
    "nativeTongue": [],
    "modificationTimestamp": "1670160000000",
    "modifierAlias": "arcaneTraveler",
    "designatedNameWithComma": {
        "designation": {
            "displayText": "Foliage, Mystic",
            "encodedText": "<p>Foliage, Mystic</p>"
        }
    },
    "otherRecognizedUniqueIdentifiers": [],
    "otherKnownLanguages": [],
    "otherKnownCodes": [],
    "territories": [
        {
            "identityCode": "DefGhi456",
            "designationEn": "Enchanted Forest",
            "designationLanguage": "Enchanted Forest"
        }
    ],
    "scientificDesignation": {
        "generatedScientificDesignation": {
            "designation": {
                "displayText": "Mysticus plantae enigma",
                "encodedText": "<em>Mysticus</em> <em>plantae</em> enigma"
            },
            "status": "MYSTERIOUS"
        },
        "genusInfo": {
            "code": "ZwxYvu789",
            "designation": "Mysticus"
        },
        "speciesInfo": {
            "code": "RstUvw123",
            "designation": "plantae"
        },
        "subSpeciesInfo": {
            "code": "KlmNop456",
            "designation": "enigma"
        },
        "subSpeciesPrefixes": [
            "sub."
        ],
        "varietyPrefixes": [
            "var."
        ]
    },
    "statusState": "UNKNOWN",
    "currentCondition": "ETERNAL",
    "classificationGroup": {
        "groupCode": "PqrStu789",
        "designation": "Enigma Kingdom",
        "designationLanguage": "Enigmatic Realm"
    }
}

I used pd.json_normalize(json) on it but some of the fields are still nested, such as "territories":

territories
[{'identityCode': 'DefGhi456', 'designationEn': 'Enchanted Forest', 'designationLanguage': 'Enchanted Forest'}]

what I want is:

territories.identityCode territories.designationEn territories.designationLanguage
'DefGhi456' 'Enchanted Forest' 'Enchanted Forest'

I tried pd.json_normalize(json, "territories") but this only gives the output for "territories" and not the rest of the json. (it does normalize 'territories' it correctly).

From this Nested list after json_normalize answer it says to do e.g:

metadata = ['name', 'period', 'title', 'description', 'id']
out = pd.json_normalize(data_read['data'], 'values', metadata)

But I have a lot of column titles and several other jsons will more fields, it would require me to create many lists manually. I tried just getting the column names using df.columns by doing:

fileReader = json.loads(data)
df = pd.DataFrame(fileReader)
metadata = list(df.columns)

j2 = pd.json_normalize(fileReader, "ranges", metadata)

But I get error:

KeyError: "Key 'note' not found. To replace missing values of 'note' with np.nan, pass in errors='ignore'"

Trying with errors='ignore' I get:

ValueError: Conflicting metadata name id, need distinguishing prefix 

Which seems to mean I'd still need to manually set the column names for some of the "id" columns? Which is what I want to avoid.

1 Answer 1

1

Try:

import json

with open("data.json", "r") as f_in:
    data = json.load(f_in)

df = pd.json_normalize(data).explode("territories")
df = pd.concat(
    [df, df.pop("territories").apply(pd.Series).add_prefix("territories.")], axis=1
)
print(df)

Prints:

   specialCode creationTimestamp      creatorAlias unusualStatus identifierCode nativeTongue modificationTimestamp   modifierAlias otherRecognizedUniqueIdentifiers otherKnownLanguages otherKnownCodes statusState currentCondition uniqueIdentifier.identity.textIdentifier uniqueIdentifier.identity.encodedIdentifier languagePreference.chosenLanguage.displayText languagePreference.chosenLanguage.languageCode designatedNameWithComma.designation.displayText designatedNameWithComma.designation.encodedText scientificDesignation.generatedScientificDesignation.designation.displayText scientificDesignation.generatedScientificDesignation.designation.encodedText scientificDesignation.generatedScientificDesignation.status scientificDesignation.genusInfo.code scientificDesignation.genusInfo.designation scientificDesignation.speciesInfo.code scientificDesignation.speciesInfo.designation scientificDesignation.subSpeciesInfo.code scientificDesignation.subSpeciesInfo.designation scientificDesignation.subSpeciesPrefixes scientificDesignation.varietyPrefixes classificationGroup.groupCode classificationGroup.designation classificationGroup.designationLanguage territories.identityCode territories.designationEn territories.designationLanguage
0           42     1630411200000  whisperingShadow    UNEXPLORED      XyzAbc123           []         1670160000000  arcaneTraveler                               []                  []              []     UNKNOWN          ETERNAL                            MysticFoliage                        <p>MysticFoliage</p>                              Enigmatic Tongue                        <p>Enigmatic Tongue</p>                                 Foliage, Mystic                          <p>Foliage, Mystic</p>                                                      Mysticus plantae enigma                                    <em>Mysticus</em> <em>plantae</em> enigma                                                  MYSTERIOUS                            ZwxYvu789                                    Mysticus                              RstUvw123                                       plantae                                 KlmNop456                                           enigma                                   [sub.]                                [var.]                     PqrStu789                  Enigma Kingdom                         Enigmatic Realm                DefGhi456          Enchanted Forest                Enchanted Forest
Sign up to request clarification or add additional context in comments.

3 Comments

I get an extra field 'territories.0' at the end, you know why this is? It's just an empty column.
@Mitch Do some teritorries contain empty dict or are missing? You can however drop this column from the final dataframe.
Yes it's possible it has empty dicts or an empty list. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.