2

I have the following sample DataFrame and List (extracted from df).

import pandas as pd
color_list = ['green','blue','red','yellow','black']

df = pd.DataFrame({'object': ['car','plane','tree','house','phone'],
                   'colors': ['red, blue',
                              'red, yellow, black',
                              'black',
                              'green, blue',
                              'yellow, green, blue']})

I managed to create a for loop that correctly checks if a certain colour (from color_list) is present in the colors column. It then creates a new column that evaluates that condition and assigns True or False to each row. The code is the following:

idx = df.columns.get_loc('colors') + 1

for i in range(len(color_list)):
    df.insert(loc= idx + i, 
                        column='has ' + color_list[i], 
                        value = (df['colors'].str.contains(color_list[i], case=False, na=False)))

It outputs the following:

  object               colors  has green  ...  has red  has yellow  has black
0    car            red, blue      False  ...     True       False      False
1  plane   red, yellow, black      False  ...     True        True       True
2   tree                black      False  ...    False       False       True
3  house          green, blue       True  ...    False       False      False
4  phone  yellow, green, blue       True  ...    False        True      False

Which is correct, the question is can i pass a condition to the value argument of pd.insert?

I want to replace True values with 'has'+(name of the column)' and False with something like 'doesn't have'+(name of column). can this be done in the same loop?

Thanks,

2 Answers 2

2

According to the documentation, the pandas.DataFrame.insert method does not have this functionality.

You can use pandas.DataFrame.apply to do so.

for color in color_list:
    col_name = 'has ' + color
    df[col_name] = df['colors'].apply(lambda x: f'has {color}' if color in x else f'doesn\'t have {color}')
Sign up to request clarification or add additional context in comments.

2 Comments

For some reason this works fine in Spyder. But in jupyter notebook it returns an error "TypeError: argument of type 'float' is not iterable"
You should be assigning a float to color_list variable somewhere. Make sure that color_list is a list.
1

not sure if this is what you are after; it looks similar to pulling out a 0s and 1s using pandas str dummies function :

repl = {0:False,1:True}

#get a dataframe of 0s and 1s
#dependent on the presence/absence of words
true_false = (df
              .colors
              .str
              .get_dummies(', ')
              .add_prefix('has ')
               #replace the 0s and 1s with False/True
              .replace(repl))

#combine data back to original dataframe
pd.concat([df,true_false],axis=1)


  object    colors         has black    has blue    has green   has red has yellow
0   car     red, blue          False    True        False       True    False
1   plane   red, yellow, black  True    False       False       True    True
2   tree    black               True    False       False       False   False
3   house   green, blue         False   True        True        False   False
4   phone   yellow, green, blue False   True        True        False   True

1 Comment

Not exactly what i want, but using a dictionary with. replace gave me added flexibility. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.