How to replace a string using a dictionary containing multiple values for a key in python

Question

I have dictionary with Word and its closest related words.

I want to replace the related words in the string with original word. Currently I am able replace words in the string which has only value per key ,I am not able to replace strings for a Key has multiple values. How can this be done

Example Input

North Indian Restaurant
South India  Hotel
Mexican Restrant
Italian  Hotpot
Cafe Bar
Irish Pub
Maggiee Baar
Jacky Craft Beer
Bristo 1889
Bristo 188
Bristo 188.

How dictionary is made

y= list(word)
words = y
similar = [[item[0] for item in model.wv.most_similar(word) if item[1] > 0.7] for word in words]
similarity_matrix = pd.DataFrame({'Orginal_Word': words, 'Related_Words': similar})
similarity_matrix = similarity_matrix[['Orginal_Word', 'Related_Words']]

Its 2 columns inside a dataframe with lists

Orginal_Word    Related_Words
[Indian]        [India,Ind,ind.]    
[Restaurant]    [Hotel,Restrant,Hotpot]   
[Pub]           [Bar,Baar, Beer]     
[1888]          [188, 188., 18]

Dictionary

similarity_matrix.set_index('Orginal_Word')['Related_Words'].to_dict()

{'Indian ': 'India, Ind, ind.',
 'Restaurant': 'Hotel, Restrant, Hotpot',
 'Pub': 'Bar, Baar, Beer'
 '1888': '188, 188., 18'}

Expected Output

North Indian Restaurant
South India  Restaurant
Mexican Restaurant
Italian  Restaurant
Cafe Pub
Irish Pub
Maggiee Pub
Jacky Craft Pub
Bristo 1888
Bristo 1888
Bristo 1888

Any help is appreciated

jezrael · Accepted Answer · 2018-01-05 12:24:56Z

2

I think you can replace by new dict with regex from this answer:

d = {'Indian': 'India, Ind, ind.',
 'Restaurant': 'Hotel, Restrant, Hotpot',
 'Pub': 'Bar, Baar, Beer',
 '1888': '188, 188., 18'}

d1 = {r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}

df['col'] = df['col'].replace(d1, regex=True)
print (df)
                        col
0   North Indian Restaurant
1   South Indian Restaurant
2        Mexican Restaurant
3       Italian  Restaurant
4                  Cafe Pub
5                 Irish Pub
6               Maggiee Pub
7           Jacky Craft Pub
8               Bristo 1888
9               Bristo 1888
10              Bristo 1888

EDIT (Function for the above code):

def replace_words(d, col):
    d1={r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
    df[col] = df[col].replace(d1, regex=True)
    return df[col]

df['col'] = replace_words(d, 'col')

EDIT1:

If get errors like:

regex error- missing ), unterminated subpattern at position 7

is necessary escape regex values in keys:

import re

def replace_words(d, col):
    d1={r'(?<!\S)'+ re.escape(k.strip()) + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
    df[col] = df[col].replace(d1, regex=True)
    return df[col]

df['col'] = replace_words(d, 'col')

edited Jan 5, 2018 at 12:24

answered Jan 5, 2018 at 5:57

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Bharath M Shetty Over a year ago

This will replace multiple times, dict is fine but how to replace once if expected

Bharath M Shetty Over a year ago

Nice with boundaries. Still need a bit more proper answer for last row. You can make it

Rahul rajan Over a year ago

@Dark,Updated the question . Its basically 2 columns inside a dataframe with lists

jezrael Over a year ago

@Rahulrajan - words for replace are always last?

Rahul rajan Over a year ago

@jezrael, Its should Bristo 188. should be Bristo 1888. Currently its coming 1888.

|

Collectives™ on Stack Overflow

How to replace a string using a dictionary containing multiple values for a key in python

1 Answer 1

11 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related