suppose I have a data frame:
df = pd.DataFrame({'Country': ['Aruba', 'lorem Andorra ipsum', 'Afgahnistan', 'Bla Yemen, Rep.', 'South Africa'],
'Geographic region': ['Latin America and Caribbean', 'Europe and Central Asia', 'South Asia', 'Middle East and North Africa', 'Sub Saharan Africa']})
How can I replace all cells which include the pattern Yemen with the string Yemen? The result should be:
df = pd.DataFrame({'Country': ['Aruba', 'lorem Andorra ipsum', 'Afgahnistan', 'Yemen', 'South Africa'],
'Geographic region': ['Latin America and Caribbean', 'Europe and Central Asia', 'South Asia', 'Middle East and North Africa', 'Sub Saharan Africa']})
In the next step, is it possible to replace all cells including Andorra with Andorra and all cells including Yemen with Yemen in one step, using lists or dictionaries?
The result should be:
df = pd.DataFrame({'Country': ['Aruba', 'Andorra', 'Afgahnistan', 'Yemen', 'South Africa'],
'Geographic region': ['Latin America and Caribbean', 'Europe and Central Asia', 'South Asia', 'Middle East and North Africa', 'Sub Saharan Africa']})
I tried for example
df.replace(regex='lorem Andorra ipsum', value='Andorra ')
which works for sure, since it looks for lorem Andorra ipsum. But that is a too specific approach. I further tried different reg expressions as
df.replace(regex=r'^Andorra.$', value='Andorra'). But it didn't work.
I appreciate any help in advance!
df['Country'][df['Country'].str.contains(r"\bYemen\b", case=False)] = 'Yemen'?