4

I have a dataframe containing (record formatted) json strings as follows:

In[9]: pd.DataFrame( {'col1': ['A','B'], 'col2': ['[{"t":"05:15","v":"20.0"}, {"t":"05:20","v":"25.0"}]', 
                                                '[{"t":"05:15","v":"10.0"}, {"t":"05:20","v":"15.0"}]']})

Out[9]: 
  col1                                               col2
0    A  [{"t":"05:15","v":"20.0"}, {"t":"05:20","v":"2...
1    B  [{"t":"05:15","v":"10.0"}, {"t":"05:20","v":"1...

I would like to extract the json and for each record add a new row to the dataframe:

    co1 t           v
0   A   05:15:00    20
1   A   05:20:00    25
2   B   05:15:00    10
3   B   05:20:00    15

I've been experimenting with the following code:

def json_to_df(x):
    df2 = pd.read_json(x.col2)
    return df2

df.apply(json_to_df, axis=1)

but the resulting dataframes are assigned as tuples, rather than creating new rows. Any advice?

2 Answers 2

5

The problem with apply is that you need to return mulitple rows and it expects only one. A possible solution:

def json_to_df(row):
    _, row = row
    df_json = pd.read_json(row.col2)
    col1 = pd.Series([row.col1]*len(df_json), name='col1')
    return pd.concat([col1,df_json],axis=1)
df = map(json_to_df, df.iterrows())      #returns a list of dataframes
df = reduce(lambda x,y:x.append(y), x)   #glues them together
df

col1    t   v
0   A   05:15   20
1   A   05:20   25
0   B   05:15   10
1   B   05:20   15
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the advice hellpanderrr. I think what I needed to know is that it's not possible to reassign different dimensions in an apply function. I also needed a way to generically assign the remaining columns to the new groups. In the end I came up with the technique shown in my answer. Cheers
2

Ok, taking a little inspiration from hellpanderrr's answer above, I came up with the following:

In [92]:
pd.DataFrame( {'X': ['A','B'], 'Y': ['fdsfds','fdsfds'], 'json': ['[{"t":"05:15","v":"20.0"}, {"t":"05:20","v":"25.0"}]', 
                                                                       '[{"t":"05:15","v":"10.0"}, {"t":"05:20","v":"15.0"}]']},)
Out[92]:
    X   Y       json
0   A   fdsfds  [{"t":"05:15","v":"20.0"}, {"t":"05:20","v":"2...
1   B   fdsfds  [{"t":"05:15","v":"10.0"}, {"t":"05:20","v":"1...

In [93]:
dfs = []
def json_to_df(row, json_col):
    json_df = pd.read_json(row[json_col])
    dfs.append(json_df.assign(**row.drop(json_col)))
 
_.apply(json_to_df, axis=1, json_col='json')
pd.concat(dfs)

Out[93]:
    t       v   X   Y
0   05:15   20  A   fdsfds
1   05:20   25  A   fdsfds
0   05:15   10  B   fdsfds
1   05:20   15  B   fdsfds

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.