How to find partial matches in a list of strings when modifying a DataFrame

Question

I have a DataFrame 'df':

        id value
0      ABC    hi
1      XYZ   hey

that I want to compare to a list of strings 'str_list':

str_list = ['abc_123', 'xyz_456']

to find partial matches to then replace the value if the partial match on 'id' is found in the str_list to make something like this:

        id       value
0      ABC   new_value
1      XYZ   new_value

As of now I have this code:

df.loc[df['id'].isin(str_list), 'value'] = 'new_val'

but that only works on complete matches (so the df 'id' values would have to be abc_123,xyz_456) in ordre to see the new_vals added.

How can I modify this to accept partial matches?

import pandas as pd
str_list = ['abc_123', 'xyz_456']
df = pd.DataFrame({'id':['ABC','XYZ'], 'value':['hi','hey']})
# this commented out df will trigger the matches correctly
#df = pd.DataFrame({'id':['abc_123','xyz_456'], 'value':['hi','hey']})
print(df)
df.loc[df['id'].isin(str_list), 'value'] = 'new_val'
print(df)

Please accept as solution by clicking the checkmark next to the best solution. — David Erickson
– David Erickson, Commented Oct 7, 2020 at 17:09

David Erickson · Accepted Answer · 2020-10-06 22:23:03Z

1

You can use some list comprehension for this task to see if a lower() of your dataframe value is in the list.

import pandas as pd
str_list = ['abc_123', 'xyz_456']
df = pd.DataFrame({'id':['ABC','XYZ'], 'value':['hi','hey']})

df['match'] = df['id'].apply(lambda x: min([y for y in str_list if x.lower() in y]))
df

Out[1]: 
    id value    match
0  ABC    hi  abc_123
1  XYZ   hey  xyz_456

answered Oct 6, 2020 at 22:23

David Erickson

16.7k2 gold badges21 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

wwnde · Accepted Answer · 2020-10-06 23:19:28Z

1

#Create new dataframe

 df2=pd.DataFrame({'text':str_list})

#Compute df['value'] using map by creating a dict from new datframe

df['value']=df.id.map(dict(zip(df2['text'].str.upper().str.split('_').str[0],df2['text'])))


   id    value
0  ABC  abc_123
1  XYZ  xyz_456

How it works

    #new dataframe
        df2=pd.DataFrame({'text':str_list})
    # new column in new dataframe
         df2['new']=df2['text'].str.upper().str.split('_').str[0]
    #dict of the two columns in new datframe
    d=dict(zip(df2['text'].str.upper().str.split('_').str[0],df2['text']))
    #map dict to initial dataframe
    df['value']=df['id'].map(d)

edited Oct 6, 2020 at 23:19

answered Oct 6, 2020 at 23:12

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

Collectives™ on Stack Overflow

How to find partial matches in a list of strings when modifying a DataFrame

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related