2

I have a dataframe with lots of categories. Here list of some of them

Bank 

(0827) ОСП                                  
(0283) Банк ВТБ (ПАО)                       
(0822) ОСИП_ПЕНСЫ                           
(0260) АО Тинькофф Банк                     
(0755) ПАО Совкомбанк

I want to filter dataframe based on string matching. I don't want to pass entire row name, i wanna pass something like ['Совкомбанк', 'Тинькофф']. The expecting result of this is :

(0260) АО Тинькофф Банк                     
(0755) ПАО Совкомбанк

I tried df = df[df[column_name].isin(values)] but i didn't work.

2 Answers 2

3

.isin will check for exact match. What you are looking for is .str.contains:

match_strs =  ['Совкомбанк', 'Тинькофф']
df = df[df[column_name].str.contains("(" + "|".join(match_strs) + ")")]

You can have custom regular expressions within str.contains(...) to search for whatever you want.

Sign up to request clarification or add additional context in comments.

4 Comments

You don't even need the parentheses around the pattern for .contains, mostly that's for extract with multiple possible values. +1.
@ALollz That's true, but having it won't harm, and it can be handy if the search term is a phrase rather than a single word.
Well no, I don't think the parenthesis don't change anything about the pattern. But they will result in a warning: This pattern has match groups. To actually get the groups, use str.extract.
@ALollz Maybe I was using it wrong :P, I knew about the warning, but it at least works for my cases.
1

If you want to just pass the names you have to clean up the Bank column

df[df['Bank'].str.split(' ').str.get(1).isin(values)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.