11

I have the following pandas dataframe. Say it has two columns: id and search_term:

id       search_term
37651    inline switch

I do:

train['search_term'] = train['search_term'].str.replace("in."," in. ")

expecting that the dataset above is unaffected, but I get in return for this dataset:

id       search_term
37651    in.  in.  switch

which means inl is replaced by in. and ine is replaced by in., as if I where using a regular expression, where dot means any character.

How do I rewrite the first command so that, literally, in. is replaced by in. but any in not followed by a dot is untouched, as in:

a = 'inline switch'
a = a.replace('in.','in. ')

a
>>> 'inline switch'
2
  • What is you actual desired output? Commented Mar 29, 2016 at 23:36
  • sorry, I want to replace 'dot' literally. I posted an answer below as I found a good post on the regular expression for 'dot'. the problem is that str.replace() in a dataframe uses regex Commented Mar 29, 2016 at 23:39

3 Answers 3

5

The version 0.23 or newer, the str.replace() got a new option for switching regex. Following will simply turn it off.

df.search_term.str.replace('in.', 'in. ', regex=False)

Will results in:

0    inline switch
1         in. here
Name: search_term, dtype: object
Sign up to request clarification or add additional context in comments.

Comments

2

and here is the answer: regular expression to match a dot.

str.replace() in pandas indeed uses regex, so that:

df['a'] = df['a'].str.replace('in.', ' in. ')

is not comparable to:

a.replace('in.', ' in. ')

the latter does not use regex. So use '\.' instead of '.' in a statement that uses regex if you really mean dot and not any character.

Regular Expression to match a dot

1 Comment

Note, however, that you can still use regex expressions, while stating that a dot has no special meaning.
1

Try escaping the .:

import pandas as pd

df = pd.DataFrame({'search_term': ['inline switch', 'in.here']})
>>> df.search_term.str.replace('in\\.', 'in. ')
0    inline switch
1          in. here
Name: search_term, dtype: object

2 Comments

thanks Ami. I see you escaped the . in the first argument, but what about the second? if you want to literally replace 'in.' by 'in. ' should you then use str.replace('in\\.', 'in\\. ') or str.replace('in\\.', 'in. ')?
@AlejandroSimkievich It would seem logical, but no. See the updated example above. Only the dot in the first string is interpreted as a regex character (which must be escaped).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.