2

I have a dataframe that looks like this

Title Ratings
Do schools kill creativity? [{'id': 7, 'name': 'Funny', 'count': 19645}, {'id': 1, 'name': 'Beautiful', 'count': 4573}, {'id': 9, 'name': 'Ingenious', 'count': 6073}, {'id': 3, 'name': 'Courageous', 'count': 3253}, {'id': 11, 'name': 'Longwinded', 'count': 387}, {'id': 2, 'name': 'Confusing', 'count': 242}, {'id': 8, 'name': 'Informative', 'count': 7346}, {'id': 22, 'name': 'Fascinating', 'count': 10581}, {'id': 21, 'name': 'Unconvincing', 'count': 300}, {'id': 24, 'name': 'Persuasive', 'count': 10704}, {'id': 23, 'name': 'Jaw-dropping', 'count': 4439}, {'id': 25, 'name': 'OK', 'count': 1174}, {'id': 26, 'name': 'Obnoxious', 'count': 209}, {'id': 10, 'name': 'Inspiring', 'count': 24924}]
Simple designs to save a life [{'id': 9, 'name': 'Ingenious', 'count': 269}, {'id': 3, 'name': 'Courageous', 'count': 92}, {'id': 7, 'name': 'Funny', 'count': 131}, {'id': 2, 'name': 'Confusing', 'count': 42}, {'id': 1, 'name': 'Beautiful', 'count': 91}, {'id': 8, 'name': 'Informative', 'count': 446}, {'id': 10, 'name': 'Inspiring', 'count': 397}, {'id': 22, 'name': 'Fascinating', 'count': 515}, {'id': 11, 'name': 'Longwinded', 'count': 45}, {'id': 21, 'name': 'Unconvincing', 'count': 49}, {'id': 24, 'name': 'Persuasive', 'count': 1234}, {'id': 25, 'name': 'OK', 'count': 73}, {'id': 23, 'name': 'Jaw-dropping', 'count': 139}, {'id': 26, 'name': 'Obnoxious', 'count': 21}]

I want to parse the data from Ratings to look like

Title Rating Count
Do schools kill creativity? Funny 19645
Do schools kill creativity? Beautiful 4573

I've tried exploding the data using } as a delimeter

#explode ratings by title
df['ratings'] = df['ratings'].str.split('}')
df_explode_ratings = df.explode('ratings').reset_index(drop=True)
cols = list(df_explode_ratings.columns)
cols.append(cols.pop(cols.index('title')))
df_explode_ratings = df_explode_ratings[cols]
df_explode_cols = ['title', 'ratings']
df_explode_ratings = df_explode_ratings.drop(columns=[col for col in df_explode_ratings if col not in df_explode_cols])

this works but then I still need to parse it farther, I was going to split again on , but wound up up with NaN values in the Ratings column.

2
  • What happens before you get this dataframe? It looks like the process resulting in this data structure could be re-engineered to provide you a much more usable file. If not and if you don't have a huge number of rows you might even be better off looping on rows and loading the strings in Ratings using the json module. Commented Apr 15, 2021 at 19:51
  • Hey thanks, This is a .csv from Kaggle, that looks like it was dumped from json so I don't have control over the dataset structure just what is in the file Commented Apr 15, 2021 at 19:59

1 Answer 1

4

Is your column Ratings string or list of dictionaries? If string, you can apply ast.literal_eval and then explode the column (if list of dictionaries, you can omit the literal_eval step):

from ast import literal_eval

df.Ratings = df.Ratings.apply(literal_eval)
df = df.explode("Ratings")
df["Rating"] = df.apply(lambda x: x["Ratings"]["name"], axis=1)
df["Count"] = df.apply(lambda x: x["Ratings"]["count"], axis=1)
df = df.drop(columns="Ratings")
print(df)

Prints:

                           Title        Rating  Count
0    Do schools kill creativity?         Funny  19645
0    Do schools kill creativity?     Beautiful   4573
0    Do schools kill creativity?     Ingenious   6073
0    Do schools kill creativity?    Courageous   3253
0    Do schools kill creativity?    Longwinded    387
0    Do schools kill creativity?     Confusing    242
0    Do schools kill creativity?   Informative   7346
0    Do schools kill creativity?   Fascinating  10581
0    Do schools kill creativity?  Unconvincing    300
0    Do schools kill creativity?    Persuasive  10704
0    Do schools kill creativity?  Jaw-dropping   4439
0    Do schools kill creativity?            OK   1174
0    Do schools kill creativity?     Obnoxious    209
0    Do schools kill creativity?     Inspiring  24924
1  Simple designs to save a life     Ingenious    269
1  Simple designs to save a life    Courageous     92
1  Simple designs to save a life         Funny    131
1  Simple designs to save a life     Confusing     42
1  Simple designs to save a life     Beautiful     91
1  Simple designs to save a life   Informative    446
1  Simple designs to save a life     Inspiring    397
1  Simple designs to save a life   Fascinating    515
1  Simple designs to save a life    Longwinded     45
1  Simple designs to save a life  Unconvincing     49
1  Simple designs to save a life    Persuasive   1234
1  Simple designs to save a life            OK     73
1  Simple designs to save a life  Jaw-dropping    139
1  Simple designs to save a life     Obnoxious     21

But as suggested in the comments, better is to handle/parse the data before creating the DataFrame.

Sign up to request clarification or add additional context in comments.

1 Comment

Very nice answer, didn't think about applying literal_eval!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.