How to extract certain parts of a string from column to create other columns in Pandas

Question

I have a dataframe that looks like this

Title

Ratings

Do schools kill creativity?

[{'id': 7, 'name': 'Funny', 'count': 19645}, {'id': 1, 'name': 'Beautiful', 'count': 4573}, {'id': 9, 'name': 'Ingenious', 'count': 6073}, {'id': 3, 'name': 'Courageous', 'count': 3253}, {'id': 11, 'name': 'Longwinded', 'count': 387}, {'id': 2, 'name': 'Confusing', 'count': 242}, {'id': 8, 'name': 'Informative', 'count': 7346}, {'id': 22, 'name': 'Fascinating', 'count': 10581}, {'id': 21, 'name': 'Unconvincing', 'count': 300}, {'id': 24, 'name': 'Persuasive', 'count': 10704}, {'id': 23, 'name': 'Jaw-dropping', 'count': 4439}, {'id': 25, 'name': 'OK', 'count': 1174}, {'id': 26, 'name': 'Obnoxious', 'count': 209}, {'id': 10, 'name': 'Inspiring', 'count': 24924}]

Simple designs to save a life

[{'id': 9, 'name': 'Ingenious', 'count': 269}, {'id': 3, 'name': 'Courageous', 'count': 92}, {'id': 7, 'name': 'Funny', 'count': 131}, {'id': 2, 'name': 'Confusing', 'count': 42}, {'id': 1, 'name': 'Beautiful', 'count': 91}, {'id': 8, 'name': 'Informative', 'count': 446}, {'id': 10, 'name': 'Inspiring', 'count': 397}, {'id': 22, 'name': 'Fascinating', 'count': 515}, {'id': 11, 'name': 'Longwinded', 'count': 45}, {'id': 21, 'name': 'Unconvincing', 'count': 49}, {'id': 24, 'name': 'Persuasive', 'count': 1234}, {'id': 25, 'name': 'OK', 'count': 73}, {'id': 23, 'name': 'Jaw-dropping', 'count': 139}, {'id': 26, 'name': 'Obnoxious', 'count': 21}]

I want to parse the data from Ratings to look like

Title	Rating	Count
Do schools kill creativity?	Funny	19645
Do schools kill creativity?	Beautiful	4573

I've tried exploding the data using } as a delimeter

#explode ratings by title
df['ratings'] = df['ratings'].str.split('}')
df_explode_ratings = df.explode('ratings').reset_index(drop=True)
cols = list(df_explode_ratings.columns)
cols.append(cols.pop(cols.index('title')))
df_explode_ratings = df_explode_ratings[cols]
df_explode_cols = ['title', 'ratings']
df_explode_ratings = df_explode_ratings.drop(columns=[col for col in df_explode_ratings if col not in df_explode_cols])

this works but then I still need to parse it farther, I was going to split again on , but wound up up with NaN values in the Ratings column.

What happens before you get this dataframe? It looks like the process resulting in this data structure could be re-engineered to provide you a much more usable file. If not and if you don't have a huge number of rows you might even be better off looping on rows and loading the strings in Ratings using the json module. — Guillaume Ansanay-Alex
– Guillaume Ansanay-Alex, Commented Apr 15, 2021 at 19:51
Hey thanks, This is a .csv from Kaggle, that looks like it was dumped from json so I don't have control over the dataset structure just what is in the file — JGH_PC
– JGH_PC, Commented Apr 15, 2021 at 19:59

Andrej Kesely · Accepted Answer · 2021-04-15 20:02:11Z

Is your column Ratings string or list of dictionaries? If string, you can apply ast.literal_eval and then explode the column (if list of dictionaries, you can omit the literal_eval step):

from ast import literal_eval

df.Ratings = df.Ratings.apply(literal_eval)
df = df.explode("Ratings")
df["Rating"] = df.apply(lambda x: x["Ratings"]["name"], axis=1)
df["Count"] = df.apply(lambda x: x["Ratings"]["count"], axis=1)
df = df.drop(columns="Ratings")
print(df)

Prints:

                           Title        Rating  Count
0    Do schools kill creativity?         Funny  19645
0    Do schools kill creativity?     Beautiful   4573
0    Do schools kill creativity?     Ingenious   6073
0    Do schools kill creativity?    Courageous   3253
0    Do schools kill creativity?    Longwinded    387
0    Do schools kill creativity?     Confusing    242
0    Do schools kill creativity?   Informative   7346
0    Do schools kill creativity?   Fascinating  10581
0    Do schools kill creativity?  Unconvincing    300
0    Do schools kill creativity?    Persuasive  10704
0    Do schools kill creativity?  Jaw-dropping   4439
0    Do schools kill creativity?            OK   1174
0    Do schools kill creativity?     Obnoxious    209
0    Do schools kill creativity?     Inspiring  24924
1  Simple designs to save a life     Ingenious    269
1  Simple designs to save a life    Courageous     92
1  Simple designs to save a life         Funny    131
1  Simple designs to save a life     Confusing     42
1  Simple designs to save a life     Beautiful     91
1  Simple designs to save a life   Informative    446
1  Simple designs to save a life     Inspiring    397
1  Simple designs to save a life   Fascinating    515
1  Simple designs to save a life    Longwinded     45
1  Simple designs to save a life  Unconvincing     49
1  Simple designs to save a life    Persuasive   1234
1  Simple designs to save a life            OK     73
1  Simple designs to save a life  Jaw-dropping    139
1  Simple designs to save a life     Obnoxious     21

But as suggested in the comments, better is to handle/parse the data before creating the DataFrame.

Very nice answer, didn't think about applying literal_eval!

Collectives™ on Stack Overflow

How to extract certain parts of a string from column to create other columns in Pandas

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related