Hi I am trying to remap a Dataframe using a dictionary in Python Pandas but I need to use regex to make things work fine.
Here is a sample of the dict:
di_cities = {
"Ain Salah (town)": "Ain Salah"
"Agadez town": "Agadez"
"Bamako city": "Bamako",
"Birnin Konni town": "Birni N Konni",
"Konni": "Birni N Konni",
"Kadunà": "Kaduna",
"Kaduna (city)": "Kaduna",
"Kano (city)": "Kano"
"Matamey": "Matamey",
"Mopti city": "Mopti"
"N'guigmi": "Nguigmi",
"Tunis": "Tunis",
"Tunis (city)": "Tunis"
}
I am using this iteration:
di_cities = {rf"\b{k}\b": v for k, v in di_cities.items()}
df_cities_clean = df.replace(di_cities, regex=True)
As you can see in the pic (final result) it works fine for Bamako, Agadez, Mopti and every sigle-word string. Doesn't for any string with parentheses and in case of Birnin Konni messes up a little bit.
I am using another dictionary in a similar way but there every string is between parentheses and {rf"\({k}\)" works perfectly.
Can you help me?
re.escape.\bwon't help then.di_cities = {rf"\b{re.escape(k)}(?:(?<=\w)\b|(?<!\w))": v for k, v in di_cities.items()}. Note that it may not work if your dictionary has overlapping keys (those that are prefixes of other(s)). This also assumes your keys always start with a word char.split(" (")and get first element@usernamemention in the comment to notify this user of your feedback.