0

I'm trying to iterate through a dataframe column to extract a certain set of words. I'm mapping these as key value pairs in a dictionary and have with some help managed to set on key per row so far.

Now, what I would like to do is return multiple keys in the same row if the values are present in the string and these should be separated by a | (pipe).

Code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Name': ['Red and Blue Lace Midi Dress', 'Long Armed Sweater Azure and Ruby',
                            'High Top Ruby Sneakers', 'Tight Indigo Jeans',
                            'T-Shirt Navy and Rose']})

colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'indigo', 'navy')}

def fetchColours(x):
    for key, values in colour.items():
        for value in values:
            if value in x.lower():
                return key
    else:
        return np.nan

df['Colour'] = df['Name'].apply(fetchColours)

Output:

        Name                            Colour
0       Red and Blue Lace Midi Dress    red
1  Long Armed Sweater Azure and Ruby    blue
2    High Top Ruby Sneakers             red
3        Tight Indigo Jeans             blue
4              T-Shirt Navy and Rose    blue

Expected result:

        Name                                 Colour
0       Red and Blue Lace Midi Dress         red
1       Long Armed Sweater Azure and Ruby    blue|red
2       High Top Ruby Sneakers               red
3       Tight Indigo Jeans                   blue
4       T-Shirt Navy and Rose                blue|red

2 Answers 2

2

The problem is that you return directly after finding a key, while you should continue searching untill all results are found:

def fetchColours(x):
    keys = []
    for key, values in colour.items():
        for value in values:
            if value in x.lower():
                keys.append(key)
    if len(keys) != 0:
        return '|'.join(keys)
    else:
        return np.nan   

For this to work you have to change:

 colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'indigo', 'navy')}

to

 colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'blue','indigo', 'navy')}

Because otherwise it won't search for the term 'blue' in each sentence, meaning it cannot add this key to the list in the first example.

Sign up to request clarification or add additional context in comments.

6 Comments

This code should work but somehow it is not, first row should be red|blue but it is only setting red.
I tried this solution and got the same result as @RehanAzher unfortunately.
@BobHarris yes and I am unable to find the reason why :)
The problem is that 'blue' is not one of the values of key 'blue'
@Nathan your solution is correct, actually i had it ready but was trying to find the reason why my output can not match OP expected output.
|
0

How about this:

def fetchColors(x):
    color_keys = []
    for key, values in color.items():
        for value in values:
            if value in x.lower():
                color_keys.append(key)
    if color_keys:
        return '|'.join(color_keys)
    else:
        return np.nan

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.