1

I'm trying to read a jsonl file with Python's pandas but don't know how to deal with a key that's a json.

What I'm doing is:

pd.read_json('jsonfile', lines=True)

And I'm getting something like:

ID  COL1    COL2    COL3
0   12047   93947   {'A': '001', 'B': '"002"'}
1   83621   24013   {'H': '101', 'J': 'TTA', 'K': 'TTB'}

Namely, the entries in COL3 are jsons that can have different keys.

How to transform the keys in COL3 in columns? Since some rows will not have values for the new generated columns, I'd ideally prefer to have it like:

ID  COL1    COL2    A      B       H    J      K
0   12047   93947  '001'  '"002"'  NA   NA     NA
1   83621   24013   NA     NA     '101' 'TTA' 'TTB'

1 Answer 1

2

you can use:

df=df.join(df.pop('COL3').apply(pd.Series))
print(df)

Or:

#i think this should be faster
df=df.join(pd.DataFrame(df.pop('COL3').values.tolist(), index=df.index))
print(df)

  ID   COL1   COL2    A      B    H    J    K
0  0   12047  93947  001  "002"  NaN  NaN  NaN
1  1   83621  24013  NaN  NaN    101  TTA  TTB

Just in case the COL3 is not an actual dict, you should first turn it to a dict by:

df.COL3=df.COL3.apply(ast.literal_eval)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.