0

I have a pandas dataframe which is similar to the follow but a lot bigger and complicated.

import pandas as pd
d = {'weight': [70, 10, 65, 1], 'String1': ['Labrador is a dog',
'Abyssinian is a cat',
'German Shepard is a dog',
'pigeon is a bird']}
df = pd.DataFrame(data=d)
df

Output

Weight String
0 70 Labrador is a dog
1 10 Abyssinian is a cat
2 65 German Shepard is a dog
3 1 pigeon is a bird

I want to create a new column, 'animal' based on column 'string1'

search_list = ['dog','cat']

if in 'search_list', then populate the value from the search list, else populate 'other'

Weight String animal
0 70 Labrador is a dog dog
1 10 Abyssinian is a cat cat
2 65 German Shepard is a dog dog
3 1 pigeon is a bird other

Please suggest how to do this. Thank you.

3 Answers 3

2

Here is one way to do it which leverages the built-in next function and its default argument:

In [7]: df["animal"] = df["String1"].map(lambda s: next((animal for animal in search_list if animal in s), "other"))
   ...:

In [8]: df
Out[8]:
   weight                  String1 animal
0      70        Labrador is a dog    dog
1      10      Abyssinian is a cat    cat
2      65  German Shepard is a dog    dog
3       1         pigeon is a bird  other

Note that if String1 is something like "I have a dog and a cat", then this will return whichever animal appears first in the search_list.

Sign up to request clarification or add additional context in comments.

Comments

2

You can use str.extract()+fillna():

df['animal']=df['String1'].str.extract(pat='(dog|cat)',expand=False).fillna('other')

OR

If you have a list of long length then:

pat='('+'|'.join(search_list)+')'
df['animal']=df['String1'].str.extract(pat=pat,expand=False).fillna('other')

output of df:

    weight  String1                     animal
0   70      Labrador is a dog           dog
1   10      Abyssinian is a cat         cat
2   65      German Shepard is a dog     dog
3   1       pigeon is a bird            other

Comments

0
df["animal"] = "other" # initial set
df.loc[df["String"].str.contains("dog", case=True), "animal"] = "dog"
df.loc[df["String"].str.contains("cat", case=True), "animal"] = "cat"

Hope to be helpful for you. Thanks.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.