4

A pandas DataFrame contains a column with descriptions and placeholders in curly braces:

descr                        replacement
This: {should be replaced}   with this

The task is to replace the text in the curly braces with text from another column in the same row. It's unfortunately not as easy as:

df["descr"] = df["descr"].str.replace(r"{*?}", df["replacement"])

~/anaconda3/lib/python3.6/site-packages/pandas/core/strings.py in replace(self, pat, repl, n, case, flags, regex)
   2532     def replace(self, pat, repl, n=-1, case=None, flags=0, regex=True):
   2533         result = str_replace(self._parent, pat, repl, n=n, case=case,
-> 2534                              flags=flags, regex=regex)
   2535         return self._wrap_result(result)
   2536 

~/anaconda3/lib/python3.6/site-packages/pandas/core/strings.py in str_replace(arr, pat, repl, n, case, flags, regex)
    548     # Check whether repl is valid (GH 13438, GH 15055)
    549     if not (is_string_like(repl) or callable(repl)):
--> 550         raise TypeError("repl must be a string or callable")
    551 
    552     is_compiled_re = is_re(pat)

TypeError: repl must be a string or callable

2 Answers 2

5

Your code is using the Pandas.Series.str.replace() and it expects two strings to perform the replacement operation, but the second parameter is a Series.

Series.str.replace(pat, repl, n=-1, case=None, flags=0, regex=True)[source]

Replace occurrences of pattern/regex in the Series/Index with some other string. Equivalent to str.replace() or re.sub(). Parameters:

pat : string or compiled regex

repl : string or callable ...

You can correct it using directly the Pandas.Series.replace() method:

df = pd.DataFrame({'descr': ['This: {should be replaced}'],
                   'replacement': 'with this'
                  })
>> df["descr"].replace(r"{.+?}", df["replacement"], regex = True)
0    This: with this

Observation:

I changed a bit of your regexp.

Sign up to request clarification or add additional context in comments.

2 Comments

You'd better use a r"{.+?}" or r"{[^{}]*}" pattern.
Thanks, @WiktorStribiżew, you are correct! I haven't given so much emphasis on the regexp part. Just edited.
4

Use list comprehension with re.sub, especially if performance is important:

import re

df['new'] = [re.sub(r"{.*?}", b, a) for a, b in zip(df['descr'], df['replacement'])]
print (df)
                        descr replacement              new
0  This: {should be replaced}   with this  This: with this
1                This: {data}         aaa        This: aaa

2 Comments

Does leaving pandas for list comprehension have better performance than using pandas.Series.replace ?
@clstaudt - sure, best test it, str operation in pandas are slow.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.