How to replace specific digits in column using python pandas.

Question

How can I replace the specific digits in columns of pandas data frame without affecting other characters? I have a large csv file which is something similar like this:

data = pd.read_csv("meter.csv")
data.head()
Out[10]:
     value  temp1  temp2
0   34 02:0   16.0     17 
1   36 06:0    8.0     27
2   28 10:0   18.0     21
3   34 02:0   16.0     17 
4   36 06:0    8.0     27
5   28 10:0   18.0     21
6   34 02:0   16.0     17 
7   36 06:0    8.0     27
8   28 10:0   18.0     21

I want to replace value column values if value.str[3:5] == 10 with 00

Output that I need:

     value  temp1  temp2
0   34 02:0   16.0     17 
1   36 06:0    8.0     27
2   28 00:0   18.0     21
3   34 02:0   16.0     17 
4   36 06:0    8.0     27
5   28 00:0   18.0     21
6   34 02:0   16.0     17 
7   36 06:0    8.0     27
8   28 00:0   18.0     21

I tried with using pd.str.replace reference:pandas.Series.str.replace. but could not able to achieve it.

My code: data['value'] = data['value'].str[3:5].replace('10','00') and this gives output:

   value  temp1  temp2
0   02   16.0     17 
1   06    8.0     27
2   00   18.0     21
3   02   16.0     17 
4   06    8.0     27
5   00   18.0     21

It is replacing entire values with new value. Could anyone help me to solve this. Thanks!

Tomas Farias · Accepted Answer · 2018-07-12 01:25:44Z

2

data['value'].str[3:5].replace('10','00') returns a pd.Series consisting of each string sliced by [3:5] with the replace method applied in each row. What you're looking for is returning the whole string with replace applied in each row that matches your condition, which can be achieved like this:

import pandas as pd

data = pd.DataFrame({ # small part of your DF
    'value': ['34 02:0', '36 06:0', '28 10:0'], # Third row should be changed
    'temp1': [16.0, 8.0, 18.0],
    'temp2': [17, 27, 21] 
})

mask = data['value'].str[3:5] == '10'
data.loc[mask, 'value'] = data.loc[mask, 'value'].str.replace('10', '00')

>>> print(data)
     value  temp1  temp2
0  34 02:0   16.0     17
1  36 06:0    8.0     27
2  28 00:0   18.0     21 # Third row changes, yay!

This code could be introducing a bug if there's a value that matches '10' more than once, for example 10 10:0. You can solve this by calling .replace('10:', '00:') instead.

You can also just use regex to match something like r'\s10\:' and call .replace(re.compile(r'\s10\:'), ' 00:').

import re

r = re.compile(r'\s10\:')
data['value'] = data['value'].str.replace(r, ' 00:', regex=True) # no need to define a condition at all

>>> print(data)
     value  temp1  temp2
0  34 02:0   16.0     17
1  36 06:0    8.0     27
2  28 00:0   18.0     21

This last solution is not as explicit as the first one with your condition.

answered Jul 12, 2018 at 1:25

Tomas Farias

1,3531 gold badge14 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

AbJ Over a year ago

Awesome and clarifying explanation! Thank you very much!

AbJ Over a year ago

Awesome and clarifying explanation! Thank you very much!

BENY · Accepted Answer · 2018-07-12 01:33:41Z

2

You can using np.where

df.value=np.where(df.value.str[3:5]=='10',df.value.str[:3]+'10'+df.value.str[5:],df.value)
df
Out[21]: 
     value  temp1  temp2
0  34 02:0   16.0     17
1  36 06:0    8.0     27
2  28 10:0   18.0     21
3  34 02:0   16.0     17
4  36 06:0    8.0     27
5  28 10:0   18.0     21
6  34 02:0   16.0     17
7  36 06:0    8.0     27
8  28 10:0   18.0      2

Or inspired by Tomas

df.value.str.replace(r'\s10\:',' 00:')

edited Jul 12, 2018 at 1:33

answered Jul 12, 2018 at 1:10

BENY

324k22 gold badges176 silver badges250 bronze badges

1 Comment

AbJ Over a year ago

Thank you very much for alternative solution.!

rafaelc · Accepted Answer · 2018-07-12 01:15:43Z

1

Using str.slice

mask=df.value.str.slice(3,5) =='10'

df.loc[mask, 'value'] = df.loc[mask].value.str.slice(0,3) + '00' +  df.loc[mask].value.str.slice(5,)

answered Jul 12, 2018 at 1:15

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Comments

Vishvas Chauhan · Accepted Answer · 2021-04-13 15:15:43Z

0

If there are multiple conditions and choice, I prefer

condition = [df.value.str[3:5]=='10']
choice= [df.value.str[:3]+'10'+df.value.str[5:]]
df.value= np.select(condition,choice,default=df.value)

#inspired by Beny

Output

  value  temp1  temp2
0  34 02:0   16.0     17
1  36 06:0    8.0     27
2  28 10:0   18.0     21
3  34 02:0   16.0     17
4  36 06:0    8.0     27
5  28 10:0   18.0     21
6  34 02:0   16.0     17
7  36 06:0    8.0     27
8  28 10:0   18.0      2

answered Apr 13, 2021 at 15:15

Vishvas Chauhan

2513 silver badges11 bronze badges

Collectives™ on Stack Overflow

How to replace specific digits in column using python pandas.

4 Answers 4

2 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related