1

I have a dataframe with column: event_name with json-objects (different types of object). I want split this column to new columns (as in json objects).

Create df:

d = [{'event_datetime': '2019-01-08 00:09:30',
  'event_json': '{"lvl":"450","tok":"1212","snum":"257","udid":"122112"}',
  'event_name': 'AdsClick'},
 {'event_datetime': '2019-01-08 00:43:21',
  'event_json': '{"lvl":"902","udid":"3123","tok":"4214","snum":"1387"}',
  'event_name': 'AdsClick'},
 {'event_datetime': '2019-02-08 00:05:01',
  'event_json': '{"lvl":"1415","udid":"214124","tok":"213123","snum":"2416","col12":"2416","col13":"2416"}'}]

df12 = json_normalize(d)

Sample:

event_datetime  event_json  event_name
0   2019-02-08 00:09:30 {"lvl":"450","tok":"1212","snum":"257","udid":...   AdsClick
1   2019-02-08 00:43:21 {"lvl":"902","udid":"3123","tok":"4214","snum"...   AdsClick
2   2019-02-08 00:05:01 {"lvl":"1415","udid":"214124","tok":"213123","...   NaN

Now I use this code:

df12 = df12.merge(df12['event_json'].apply(lambda x: pd.Series(json.loads(x))), left_index=True, right_index=True)

Result:

event_datetime  event_json  event_name  lvl snum    tok udid    col12   col13
0   2019-02-08 00:09:30 {"lvl":"450","tok":"1212","snum":"257","udid":...   AdsClick    450 257 1212    122112  NaN NaN
1   2019-02-08 00:43:21 {"lvl":"902","udid":"3123","tok":"4214","snum"...   AdsClick    902 1387    4214    3123    NaN NaN
2   2019-02-08 00:05:01 {"lvl":"1415","udid":"214124","tok":"213123","...   NaN 1415    2416    213123  214124  2416    2416

But it is very slow. Have you any idea for faster code?

1 Answer 1

1

Use list comprehension with DataFrame constructor and add to original by DataFrame.join:

df = df12.join(pd.DataFrame([json.loads(x) for x in df12['event_json']]))
print (df)
        event_datetime                                         event_json  \
0  2019-01-08 00:09:30  {"lvl":"450","tok":"1212","snum":"257","udid":...   
1  2019-01-08 00:43:21  {"lvl":"902","udid":"3123","tok":"4214","snum"...   
2  2019-02-08 00:05:01  {"lvl":"1415","udid":"214124","tok":"213123","...   

  event_name col12 col13   lvl  snum     tok    udid  
0   AdsClick   NaN   NaN   450   257    1212  122112  
1   AdsClick   NaN   NaN   902  1387    4214    3123  
2        NaN  2416  2416  1415  2416  213123  214124  

If also is necessary remove source column use DataFrame.pop:

df = df12.join(pd.DataFrame([json.loads(x) for x in df12.pop('event_json')]))
print (df)
        event_datetime event_name col12 col13   lvl  snum     tok    udid
0  2019-01-08 00:09:30   AdsClick   NaN   NaN   450   257    1212  122112
1  2019-01-08 00:43:21   AdsClick   NaN   NaN   902  1387    4214    3123
2  2019-02-08 00:05:01        NaN  2416  2416  1415  2416  213123  214124
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.