I have a column of ingredients in a pandas dataframe. I need to remove everything except the name of the ingredient (ex: 1/3 cup cashews > cashews).
Input
recipe_name ingredient
0 Truvani Chocolate Turmeric Caramel Cups ⅓ cup cashews
1 Truvani Chocolate Turmeric Caramel Cups 4 dates
2 Truvani Chocolate Turmeric Caramel Cups 1 tablespoon almond butter
3 Truvani Chocolate Turmeric Caramel Cups 3 tablespoons coconut milk
4 Truvani Chocolate Turmeric Caramel Cups ½ teaspoon vanilla extract
Expected Output
recipe_name ingredient
0 Truvani Chocolate Turmeric Caramel Cups cashews
1 Truvani Chocolate Turmeric Caramel Cups dates
2 Truvani Chocolate Turmeric Caramel Cups almond butter
3 Truvani Chocolate Turmeric Caramel Cups coconut milk
4 Truvani Chocolate Turmeric Caramel Cups vanilla extract
I've tried using a dictionary, with common words mapped to empty strings like so:
remove_list ={'\d+': '', 'ounces': '', 'ounce': '', 'tablespoons': '', 'tablespoon': '', 'teaspoons': '', 'teaspoon': '', 'cup': '', 'cups': ''}
column = df['ingredient']
column.apply(lambda column: [remove_list[y] if y in remove_list else y for y in column])
This didn't change the data at all.
I've also tried using regex:
df['ingredients'] = re.sub(r'|'.join(map(re.escape, remove_list)), '', df['ingredients'])
But that just gives an error saying "TypeError: expected string or buffer."
I'm very new to Python so I think it's possible with regex, I'm just not sure how to do it.