I have a tab seperated flatfile, one column of which is JSON data stored as a string, e.g.
Col1 Col2 Col3
1491109818 2017-08-02 00:00:09.250 {"type":"Tipper"}
1491110071 2017-08-02 00:00:19.283 {"type":"HGV"}
1491110798 2017-08-02 00:00:39.283 {"type":"Tipper"}
1491110798 2017-08-02 00:00:39.283 \N
...
What I want to do is load the table as a pandas dataframe, and for col3 change the data to a string with just the information from the type key. Where there is no JSON or a JSON without a type key I want to return None.
e.g.
Col1 Col2 Col3
1491109818 2017-08-02 00:00:09.250 Tipper
1491110071 2017-08-02 00:00:19.283 HGV
1491110798 2017-08-02 00:00:39.283 Tipper
1491110798 2017-08-02 00:00:39.283 None
...
The only way I can think to do this is with iterrows, however this is very slow when dealing with large files.
for index, row in df.iterrows():
try:
df.loc[index, 'Col3'] = json.loads(row['Col3'])['type']
except:
df.loc[index, 'Col3'] = None
Any suggestions on a quicker approach?