I am trying to find and replace words from the 20K comments. Find and replace words are stored in dataframe and its around more than 20000. Comments in different dataframe and its around 20K.
Below is the example
import pandas as pd
df1 = pd.DataFrame({'Data' : ["Hull Damage happened and its insured by maritime hull insurence company","Non Cash Entry and claims are blocked"]})
df2 = pd.DataFrame({ 'Find' : ["Insurence","Non cash entry"],
'Replace' : ["Insurance","Blocked"],
})
And I am expecting the output below
op = ["Hull Damage happened and its insured by maritime hull insurance company","Blocked and claims are blocked"]})
Please help.
I am using loop but its taking more than 20 mins to do this. 20 k records in the data, 30000 words to be replaced
""KeywordSynonym"" -- Dataframe holds find and replace data in sql
""backup"" -- Dataframe hold data to be cleaned
backup = str(backup)
TrainingClaimNotes_KwdSyn = []
for index,row in KeywordSynonym.iterrows():
word = KeywordSynonym.Synonym[index].lower()
value = KeywordSynonym.Keyword[index].lower()
my_regex = r"\b(?=\w)" + re.escape(word) + r"\b(?!\w)"
if re.search(my_regex,backup):
backup = re.sub(my_regex, value, backup)
TrainingClaimNotes_KwdSyn.append(backup)
TrainingClaimNotes_KwdSyn_Cmp = backup.split('\'", "\'')