2

I have a pandas column with some strings values like:

White bear
Brown Bear
Brown Bear 100 Kg
White bear 200 cm             

How to check all the strings if they contain the sequence 'White bear' and replace the entire value (not only the sequence) with a string like 'White_bear'?

df['Species'] = df['Species'].str.replace('White bear', 'White_bear')   

did not work right for me because it replaces only the sequence.

0

1 Answer 1

2

you can use boolean indexing:

In [173]: df.loc[df.Species.str.contains(r'\bWhite\s+bear\b'), 'Species'] = 'White_bear'

In [174]: df
Out[174]:
             Species
0         White_bear
1         Brown Bear
2  Brown Bear 100 Kg
3         White_bear

or bit more general solution:

In [204]: df
Out[204]:
             Species
0         White bear
1         Brown Bear
2  Brown Bear 100 Kg
3  White bear 200 cm

In [205]: from_re = [r'.*?\bwhite\b\s+\bbear\b.*',r'.*?\bbrown\b\s+\bbear\b.*']

In [206]: to_re = ['White_bear','Brown_bear']

In [207]: df.Species = df.Species.str.lower().replace(from_re, to_re, regex=True)

In [208]: df
Out[208]:
      Species
0  White_bear
1  Brown_bear
2  Brown_bear
3  White_bear

RegEx explanation

Sign up to request clarification or add additional context in comments.

10 Comments

Thanks! Why do whe need this 'r' ,\b and \s? it works also without them
@ИонСынкетру, those are RegEx special symbols: \s - means any space symbol (white space or tab), \b - means word boundary, etc.
from_re = [r'.*?\bwhite\s+\bbear\b.*', r'.*?\btiger\s+\bbear\b.*', r'.*?\bbull\s+\bear\b.*', r'.*?\blue\s+\bear\b.*', r'.*?\blacktip\s+\bear\b.*'] I've tried to ad another types of bears but for them it did not work. Why?
I found my error, i have lost a 'b' before blacktip and blue
Try this: r'.*\d+.*'
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.