0

Here is my regex function:

def parse_repl(df_item):
    for pattern, replacement in d_comp.items():
        df_item = pattern.sub(replacement, df_item)
    return df_item

d_comp is a compiled dictionary of regex items to replace.

I'm calling it like this:

df.apply(parse_repl)
df.to_csv(...)

I also tried apply with axis=0 and axis=1 and neither worked.

the error is this:

TypeError: ('expected string or bytes-like object', 'occurred at index myField')

Error happens in this line of the parse_repl function:

df_item = pattern.sub(replacement, df_item)

Presumably because sub expects a byte array.

The question is, how can I convert df_item to where it will work within the sub call, ie, change the item's data, and then return the changes back int the main DF intact?

Thanks!

1 Answer 1

1

First, I try to replicate your issue without a given example.

import pandas as pd
import re

df = pd.DataFrame({'x': ['a', 'b', 'c'], 'y': ['q', 'w', 'e']})

d_comp = {
    re.compile('a'): 'new_a',
    re.compile('q'): 'new_q',
}

def parse_repl(df_item):
    for pattern, replacement in d_comp.items():
        df_item = pattern.sub(replacement, df_item)
    return df_item

df.apply(parse_repl)

When using df.apply the function passed into it should take a series as arguement, so clearly pattern.sub(replacement, df_item) is not going to work as df_item is not a string nor byte here, it is a series.

You can try fixing your parse_repl to achieve your goal, but I recommend using something like below

In [1]:     import pandas as pd
   ...:     from IPython.display import display
   ...: 
   ...:     df = pd.DataFrame({'x': ['a', 'b', 'c'], 'y': ['q', 'w', 'e']})
   ...:     display('Original')
   ...:     display(df)
   ...: 
   ...:     regex_to_replace = {
   ...:         'a': 'new_a',
   ...:         'q': 'new_q',
   ...:     }
   ...: 
   ...:     for column_name in df:
   ...:         column = df[column_name]
   ...:         for regex_patten, replacement in regex_to_replace.items():
   ...:             column = column.str.replace(regex_patten, replacement)
   ...: 
   ...:         df[column_name] = column
   ...: 
   ...:     display('Replaced')
   ...:     display(df)
   ...: 
   ...: 
'Original'
   x  y
0  a  q
1  b  w
2  c  e
'Replaced'
       x      y
0  new_a  new_q
1      b      w
2      c      e
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.