1. Summarize the problem
I have a text file and a specific dictionary containing words in a dataframe. The txt file contains sentences (Strings) separated with lines.
Only a specific column of the dictionary is relevant for me and contains the keywords that I want to match with my text. I want then to print the best match(by best I mean the longest one) in a dataframe.
2. Describe what you’ve tried
I created two Dataframes: one for the output and the other to import the csv dictionary:
Output = pd.DataFrame(columns=['stuff','Bestmatch'])
MyDictionary = pd.read_csv('mydic.csv', sep=r'\t', engine='python', encoding='utf-8')
3. Show some code Then I tried to code the main function:
def fetchword():
with open (mytext.txt", "w+") as f:
lines = f.readlines()
for value in MyDictionary["substance_name"].values:
Here, I am not sure what I can do to finish the loop.
f.close()
PS: if there are many matches in the MyDictionary column, I want to choose the longest one and to print it into a new dataframe
Example for the csv dictionary file MyDictionary:
substance_name Quantity
Acetaminophen 3
ibuprofen 4
Levothyroxin 5
Metformin 7
My text file for instance:
Acetaminophen 3x/d for one week
ibuprofen 1/d for 3 days