2

I have a list of elements in string format that I want to search in each row and delete others.

The code below works fine.

However, it replaces the search from the last element of the list.

I am trying to capture every results from the list 'l'.

Please see below for input and expected output.

Code:

l = ['Testing','Goals are met','Mathematics subject','tesTed prototype','Some Test']
df = pd.DataFrame(l)
df.columns = ['l']

Input Data:

    l
0   Testing
1   Goals are met
2   Mathematics subject
3   tesTed prototype
4   Some Test

Code to capture the strings contains:

select_list = ["Math",'Test']

for s in select_list:
    # keeping into a dataframe
    df1 = df[df.l.str.contains(s,case=False)]

df1

Expected output: Notice the code above didn't select the string 'Math' from above.

l
0   Testing
2   Mathematics subject
3   tesTed prototype
4   Some Test

2 Answers 2

4

The reason is that you are reassigning to df1 with every iteration of the for loop.

Instead of doing so, you should use a regular expression:

filtered_df = df[df['l'].str.contains('|'.join(select_list), case=False)]

Output:

                     l
0              Testing
2  Mathematics subject
3     tesTed prototype
4            Some Test

The above .join call produces the string 'Math|Test', which, when passed to .str.contains, tells it to look for all rows which contain at least one of 'Math' and 'Test'. If you add more strings to select_list, then it will look for them too.

Note that in certain cases (say, if strings in select_list contain special characters like "."), this approach may require modification.

Sign up to request clarification or add additional context in comments.

Comments

0

Please try this

select_list = ["Math",'Test']
df1 =  pd.DataFrame([], columns = ['l'])
for s in select_list:
    df1 = pd.merge(df1, df[df.l.str.contains(s,case=False)], how='outer')

alternate: instead of using dataframe in the loop you can also use list to capture the result and create the dataframe

l2 = []
for s in select_list:
    l2.extend(df[df.l.str.contains(s,case=False)].values.tolist())

df3 = pd.DataFrame(l2)
df3.columns = ['l']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.