0

I am very new to Python and am trying to convert nested JSON to CSV. Below is the Python script I am trying, but I'm not getting desired output.

import json
import pandas as pd

# Load via context manager and read_json() method
with open('employee_data1.json', 'r')as file:
    # load JSON data and parse into Dictionary object
    data = json.load(file)
    
# Load JSON as DataFrame 
df = pd.json_normalize(data)


# Print Result
print(df)

# output DataFrame to CSV file
df.to_csv('employee_data.csv')

I am actually trying 2 JSONs data with the above code, getting different output for each one.

employee_data1.json:

{
    "features": [
        {
            "candidate": {
                "first_name": "Margaret",
                "last_name": "Mcdonald",
                "skills": [
                    "skLearn",
                    "Java",
                    "R",
                    "SQL",
                    "Spark",
                    "C++"
                ],
                "state": "AL",
                "specialty": "Database",
                "experience": [
                    {
                        "company": "XYZ Corp",
                        "position": "Software Engineer",
                        "start_date": "2016-01-01",
                        "end_date": "2021-03-01"
                    },
                    {
                        "company": "ABC Inc",
                        "position": "Senior Software Engineer",
                        "start_date": "2021-04-01",
                        "end_date": null
                    }
                ],
                "relocation": "no"
            }
        },
        {
            "candidate": {
                "first_name": "Michael",
                "last_name": "Carter",
                "skills": [
                    "TensorFlow",
                    "R",
                    "Spark",
                    "MongoDB",
                    "C++",
                    "SQL"
                ],
                "state": "AR",
                "specialty": "Statistics",
                "experience": [
                    {
                        "company": "DFC Corp",
                        "position": "Software Engineer",
                        "start_date": "2016-01-01",
                        "end_date": "2021-03-01"
                    },
                    {
                        "company": "SDC Inc",
                        "position": "Senior Software Engineer",
                        "start_date": "2021-04-01",
                        "end_date": null
                    }
                ],
                "relocation": "yes"
            }
        }
    ]
}

employee_data2.json:

{
    "features": 
      {
        "candidate": {
          "first_name": "Margaret",
          "last_name": "Mcdonald",
          "skills": [
            "skLearn",
            "Java",
            "R",
            "SQL",
            "Spark",
            "C++"
          ],
          "state": "AL",
          "specialty": "Database",
          "experience": [
            {
              "company": "XYZ Corp",
              "position": "Software Engineer",
              "start_date": "2016-01-01",
              "end_date": "2021-03-01"
            },
            {
              "company": "ABC Inc",
              "position": "Senior Software Engineer",
              "start_date": "2021-04-01",
              "end_date": null
            }
          ],
          "relocation": "no"
        }
      }
  }

Below, I have chosen only a few fields, instead of all fields. I am expecting the below Desired output. I will be glad if someone can able to help me out on this.

candidate.first_name, candidate.last_name, candidate.skills, candidate.state, candidate.experience.company, candidate.experience.position

Margaret, Mcdonald, "['skLearn', 'Java', 'R', 'SQL', 'Spark', 'C++']", AL, XYZ Corp, Software Engineer
4
  • 4
    Why would you do this? JSON is a much smarter way to store and transmit this data. There's no standard that allows a bracketed list in a CSV file. Commented Nov 5, 2023 at 5:53
  • Perhaps see also stackoverflow.com/a/65338582/874188 Commented Nov 5, 2023 at 8:43
  • @TimRoberts Is that possible to store nested json array in SQL table?, json to csv and then store csv data into SQL table? Commented Nov 6, 2023 at 4:37
  • Again, that's not a sensible path. Storing an array inside a field is not a good choice. If you need to store this long term, use something like MongoDB that stores JSON documents natively Commented Nov 6, 2023 at 4:46

1 Answer 1

0

You can use json_normalize() like this:

df = pd.json_normalize(your_json_data,record_path=['features',["candidate","experience"]],
                       meta=[["features","candidate","first_name"],["features","candidate","last_name"],
                              ["features","candidate","relocation"],["features","candidate","skills"],
                                    ["features","candidate","specialty"],["features","candidate","state"]])

But it will throw this error:

ValueError: operands could not be broadcast together with shape (12,) (2,)

It is probably a bug. Take a look the issue about this on github: BUG: json_normalize fails with empty arrays/lists. To avoid this error you should convert lists to string then use json_normalize finally convert string type lists to lists:

if len(your_json_data["features"]) > 1:
    for i in your_json_data["features"]:
        i["candidate"]["skills"] = str(i["candidate"]["skills"])
else:
    your_json_data["features"]["candidate"]["skills"] = str(your_json_data["features"]["candidate"]["skills"])

After json_normalize:

df ["features.candidate.skills"] = df["features.candidate.skills"].apply(ast.literal_eval)

Out:

|    | company   | position                 | start_date   | end_date   | features.candidate.first_name   | features.candidate.last_name   | features.candidate.relocation   | features.candidate.skills                             | features.candidate.specialty   | features.candidate.state   |
|---:|:----------|:-------------------------|:-------------|:-----------|:--------------------------------|:-------------------------------|:--------------------------------|:------------------------------------------------------|:-------------------------------|:---------------------------|
|  0 | XYZ Corp  | Software Engineer        | 2016-01-01   | 2021-03-01 | Margaret                        | Mcdonald                       | no                              | ['skLearn', 'Java', 'R', 'SQL', 'Spark', 'C++']       | Database                       | AL                         |
|  1 | ABC Inc   | Senior Software Engineer | 2021-04-01   | nan        | Margaret                        | Mcdonald                       | no                              | ['skLearn', 'Java', 'R', 'SQL', 'Spark', 'C++']       | Database                       | AL                         |
|  2 | DFC Corp  | Software Engineer        | 2016-01-01   | 2021-03-01 | Michael                         | Carter                         | yes                             | ['TensorFlow', 'R', 'Spark', 'MongoDB', 'C++', 'SQL'] | Statistics                     | AR                         |
|  3 | SDC Inc   | Senior Software Engineer | 2021-04-01   | nan        | Michael                         | Carter                         | yes                             | ['TensorFlow', 'R', 'Spark', 'MongoDB', 'C++', 'SQL'] | Statistics                     | AR                         |

Full code:

import ast
if len(your_json_data["features"]) > 1:
    for i in your_json_data["features"]:
        i["candidate"]["skills"] = str(i["candidate"]["skills"])
else:
    your_json_data["features"]["candidate"]["skills"] = str(your_json_data["features"]["candidate"]["skills"])

df = pd.json_normalize(your_json_data,record_path=['features',["candidate","experience"]],
                       meta=[["features","candidate","first_name"],["features","candidate","last_name"],
                ["features","candidate","relocation"],["features","candidate","skills"],
                ["features","candidate","specialty"],["features","candidate","state"]])

df["features.candidate.skills"] = df["features.candidate.skills"].apply(ast.literal_eval)
Sign up to request clarification or add additional context in comments.

5 Comments

Hey at first many Thanks for your code, ur code working for employee_data1.json , but not for employee_data2.json, getting some error..
Okey. I edited my answer. Can you check it?
Thank you very much, now working for both the jsons.. If you dont mind, can you pls make few other changes too ie. experience field column value should show similar like skills field column value only, instead of show them each experience of the same person in each record. and can u pls rename features.candidate.first_name to first_name (without having features.candidate) and same applies to other fields too. Thanks again!
This is not a free consulting service. YOU need to take the initiative to clean up the suggestions that were made here.
@TimRoberts I have already tried from my end, please find my script code that I've tried in my actual topic. so came here to take some help.. Am not asking without trying from my end at all.. btw, am very new to python..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.