How to replace specific words from entire csv file?

Question

I have a large CSV file that has many short words and I need to change them into a full word. I found few posts here such as 1, 2 but most of these are either change the entire row or needs to do manually one by one.

My CSV file looks like:

infoID               messages
 111     we need to fix the car mag but we can't
 113         we need a shf to perform eng change
 115                      gr is needed to change
 116                            bat needs change
 117                    car towed for ext change 
 118                              car ml is high
  .
  .

My another file that has all the full word of short-form words and I want to use that to apply in my document and it is in the form of:

shf:shaft
gr:gear
ml:mileage

It would be great if you can provide your help with code that I can run in my side as well. Thanks

cs95 · Accepted Answer · 2019-06-07 00:35:33Z

4

Read your text file in as a Series that looks like

s

0    mag:magnitude
1        shf:shaft
2          gr:gear
3      bat:battery
4      ext:exhaust
5       ml:mileage
Name: 0, dtype: object

Split on colon and convert the series into a dictionary mapping key to its replacement:

dict(s.str.split(':').tolist())

# {'bat': 'battery',
#  'ext': 'exhaust',
#  'gr': 'gear',
#  'mag': 'magnitude',
#  'ml': 'mileage',
#  'shf': 'shaft'}

Use this to perform a replace operation with regex=True:

df['messages'].replace(dict(s.str.split(':').tolist()), regex=True)

0    we need to fix the car magnitude but we can't
1            we need a shaft to perform eng change
2                         gear is needed to change
3                             battery needs change
4                     car towed for exhaust change
5                              car mileage is high
Name: messages, dtype: object

Note that if these are strictly whole word replacements, you can extend this solution by converting the key strings into regular expressions that use word boundaries. For good measure, escape the string as well:

import re

mapping = {fr'\b{re.escape(k)}\b': v for k, v in s.str.split(':').tolist()}
df['messages'].replace(mapping, regex=True)

0    we need to fix the car magnitude but we can't
1            we need a shaft to perform eng change
2                         gear is needed to change
3                             battery needs change
4                     car towed for exhaust change
5                              car mileage is high
Name: messages, dtype: object

edited Jun 7, 2019 at 0:35

answered Jun 7, 2019 at 0:26

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

razdi Over a year ago

Just curious, why is the regex=True required?

Chris Over a year ago

Isn't this a bit error prone if there is any other word that contains the keys of dictionary? E.g. something like great will be changed to geareat as well.

remeus Over a year ago

@razdi Without it, pandas looks for exact match, so the whole line must match the searched text.

cs95 Over a year ago

@Chris That's true, but without any context this is the simplest solution. If whole word replacements are required, then the solution can be extended with word boundaries.

Chris Over a year ago

@cs95 I see. I too love the simplicity of your post. Thanks for the response :)

|

Chris · Accepted Answer · 2019-06-07 00:28:09Z

3

Another way using pd.Series.apply:

d = dict(i.split(':') for i in d.split('\n'))
#{'bat': 'battery',
# 'ext': 'exhaust',
# 'gr': 'gear',
# 'mag': 'magnitude',
# 'ml': 'mileage',
# 'shf': 'shaft'}

df['messages'].apply(lambda x : ' '.join(d.get(i, i) for i in x.split()), 1)

Output:

0    we need to fix the car magnitude but we can't
1            we need a shaft to perform eng change
2                         gear is needed to change
3                             battery needs change
4                     car towed for exhaust change
5                              car mileage is high
Name: messages, dtype: object

answered Jun 7, 2019 at 0:28

Chris

29.8k3 gold badges34 silver badges56 bronze badges

Collectives™ on Stack Overflow

How to replace specific words from entire csv file?

2 Answers 2

10 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related