5

I have a JSON file as follows:

{
    "ALPHA": [
        {
            "date": "2021-06-22",
            "constituents": {
                "BBB": 0,
                "EEE": 1,
                "BTB": 1,
                "YUY": 1
            }
        },
        {
            "date": "2021-09-07",
            "constituents": {
                "BBB": 0,
                "EEE": 0,
                "BTB": 0,
                "YUY": 0
            }
        }
    ],
    "BETA": [
        {
            "date": "2021-06-22",
            "constituents": {
                "BBB": 1,
                "EEE": 1,
                "BTB": 1,
                "YUY": 1
            }
        },
        {
            "date": "2021-09-07",
            "constituents": {
                "BBB": 1,
                "EEE": 1,
                "BTB": 1,
                "YUY": 1
            }
        }
    ],

    "THETA": [
        {
            "date": "2021-06-22",
            "constituents": {
                "BBB": 0,
                "EEE": 1,
                "BTB": 1,
                "YUY": 0
            }
        },
        {
            "date": "2021-08-20",
            "constituents": {
                "BBB": 0,
                "EEE": 1,
                "BTB": 1,
                "YUY": 0
            }
        },
        {
            "date": "2021-09-07",
            "constituents": {
                "BBB": 0,
                "EEE": 1,
                "BTB": 1,
                "YUY": 0
            }
        }
    ]
}

I want to read the above into a pandas data frame where the first index is the date, the second index is the first keys (i.e. "ALPHA", "BETA", "THETA"), the columns are the inner keys (i.e. "BBB" ,"EEE", "BTB" ,"YUY"), and the cell values are the values of these inner keys.

How can I read that into pandas from the JSON file?

2 Answers 2

4

You can use pd.Series to import the JSON to a Pandas series with ALPHA, BETA as index and elements as list. Then expand the list of JSON to individual JSON by .explode(). Expand the inner JSON to dataframe by .apply() + pd.Series.

Append date as index by .set_index() with append=True; swap date from second index to first index by .swaplevel().

Finally, take the column constituents and further expand the inner JSON to dataframe by .apply() + pd.Series, as follows:

(assume you have already loaded the JSON file into j)

df = (pd.Series(j)
        .explode()
        .apply(pd.Series)
        .set_index('date', append=True)
        .swaplevel()['constituents']
        .apply(pd.Series)
     )

Data input:

j = {'ALPHA': [{'date': '2021-06-22',
   'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 1}},
  {'date': '2021-09-07',
   'constituents': {'BBB': 0, 'EEE': 0, 'BTB': 0, 'YUY': 0}}],
 'BETA': [{'date': '2021-06-22',
   'constituents': {'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1}},
  {'date': '2021-09-07',
   'constituents': {'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1}}],
 'THETA': [{'date': '2021-06-22',
   'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 0}},
  {'date': '2021-08-20',
   'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 0}},
  {'date': '2021-09-07',
   'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 0}}]}

Output

print(df)


                  BBB  EEE  BTB  YUY
date                                
2021-06-22 ALPHA    0    1    1    1
2021-09-07 ALPHA    0    0    0    0
2021-06-22 BETA     1    1    1    1
2021-09-07 BETA     1    1    1    1
2021-06-22 THETA    0    1    1    0
2021-08-20 THETA    0    1    1    0
2021-09-07 THETA    0    1    1    0
Sign up to request clarification or add additional context in comments.

4 Comments

this is just perfect
perfect answer, very useful tips , thanks!
@Muhammadhassan Thanks!
@AliCrash Thanks!
1

I feel you get better performance, and potentially easier manipulation, if you deal with python native data structures outside Pandas, before pulling the final form into Pandas:

Let's flatten the nested dictionary into a single dictionary, using Python's tools:

container = []
for key, value in j.items(): # j is the main dictionary
    for entry in value:
        content = {'date': entry['date'], 
                   'key': key, 
                   # expand the nested constituent data
                   # this gets us a single dictionary 
                   **entry['constituents']}
        container.append(content)

print(container)
[{'date': '2021-06-22',
  'key': 'ALPHA',
  'BBB': 0,
  'EEE': 1,
  'BTB': 1,
  'YUY': 1},
 {'date': '2021-09-07',
  'key': 'ALPHA',
  'BBB': 0,
  'EEE': 0,
  'BTB': 0,
  'YUY': 0},
 {'date': '2021-06-22', 'key': 'BETA', 'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1},
 {'date': '2021-09-07', 'key': 'BETA', 'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1},
 {'date': '2021-06-22',
  'key': 'THETA',
  'BBB': 0,
  'EEE': 1,
  'BTB': 1,
  'YUY': 0},
 {'date': '2021-08-20',
  'key': 'THETA',
  'BBB': 0,
  'EEE': 1,
  'BTB': 1,
  'YUY': 0},
 {'date': '2021-09-07',
  'key': 'THETA',
  'BBB': 0,
  'EEE': 1,
  'BTB': 1,
  'YUY': 0}]

Now, build the dataframe, and set the required columns as index:

pd.DataFrame(container).set_index(['date', 'key'])

                  BBB  EEE  BTB  YUY
date       key
2021-06-22 ALPHA    0    1    1    1
2021-09-07 ALPHA    0    0    0    0
2021-06-22 BETA     1    1    1    1
2021-09-07 BETA     1    1    1    1
2021-06-22 THETA    0    1    1    0
2021-08-20 THETA    0    1    1    0
2021-09-07 THETA    0    1    1    0

3 Comments

Maybe is posiibble add timings? I think this is very fast, I like this solution.
yes, it is faster; the data is a small size though, so not sure if speed tests is justified. And I don't know how to generate nested dictionaries that match the OP's form
Ya, maybe in data sample should be ouput also interesting (but necessary state is necessary also test in real for OP)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.