Reading a json as Multiindex DataFrame

Question

I have a json which comprises like this.

[
    {
        "bi-document-identifier": "USA",
        "text": "I found somwthing Interesting",
        "relation_type": "ImportanceRelation",
        "annotation_type": "Opinionholder",
        "relation_start": {
            "annotatedtext": "We",
            "annotatedtext_represents": "OpinionHolder",
            "annotatedtext_startposition": 146,
            "annotatedtext_endposition": 148
        },
        "relation_end": {
            "annotatedtext": "information",
            "annotatedtext_represents": "StanceTarget",
            "annotatedtext_startposition": 239,
            "annotatedtext_endposition": 250
        }
    },
    {
        "bi-document-identifier": "USB",
        "text": "This is quite new to me",
        "relation_type": "ImportanceRelation",
        "annotation_type": "Opinionholder",
        "relation_start": {
            "annotatedtext": "Dr. A_PERSON",
            "annotatedtext_represents": "OpinionHolder",
            "annotatedtext_startposition": 0,
            "annotatedtext_endposition": 12
        },
        "relation_end": {
            "annotatedtext": "MOA of CV risk reduction",
            "annotatedtext_represents": "StanceTarget",
            "annotatedtext_startposition": 29,
            "annotatedtext_endposition": 53
        }
    }
]

how to read as a multi header dataframe. The orient and typ options in pandas.read_json() give me some errors like below

TypeError: list indices must be integers or slices, not str

Partha Mandal · Accepted Answer · 2020-05-03 15:33:43Z

2

Why don't you use pandas' json_normalize?

from pandas import json_normalize
df = json_normalize(data)
df.head(2)

Otherwise, this seems to cover your specific usecase:

Creating a multi-header table from JSON

answered May 3, 2020 at 15:33

Partha Mandal

1,45111 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Bharath_Raja Over a year ago

Nice quick Solution. Thanks for sharing!

Valdi_Bo · Accepted Answer · 2020-05-03 16:31:16Z

1

Start from read_json with records orientation:

df = pd.read_json('Input.json', orient='records')

Then convert both relation columns to "MultiIndexed" version:

d = { tt: df[tt].apply(pd.Series) for tt in ['relation_start', 'relation_end'] }
wrk1 = pd.concat(d, axis=1, keys=d.keys())

The added MultiIndex level is the top level (relation_start and relation_end) and index keys in source values become the second level.

The third step is to convert 4 initial columns, also to MultiIndex version, but this time the added level is the second level, containing only empty strings:

wrk2 = df.iloc[:, 0:4]
wrk2.columns = pd.MultiIndex.from_product([wrk2.columns, ['']])

And the last thing to do is to contatenate both wrk DataFrames (on the index):

result = wrk2.join(wrk1)

Or if you want a shorter code, don't create wrk1 (d is enough), and place corresponding expression in the final instruction:

result = wrk.join(pd.concat(d, axis=1, keys=d.keys()))

Note: json_normalize proposed in the other solution is undeniably shorter, but the result has plain (single level) column index. As I understood, you want just a MultiIndex on columns.

edited May 3, 2020 at 16:31

answered May 3, 2020 at 16:13

Valdi_Bo

31.1k4 gold badges29 silver badges45 bronze badges

1 Comment

Bharath_Raja Over a year ago

kudos. I appreciate your effort. Great. Thanks for the answer.

Collectives™ on Stack Overflow

Reading a json as Multiindex DataFrame

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related