Pandas full Dataframe Apply to regex function throws error: TypeError: 'expected string or bytes-like object'

Question

Here is my regex function:

def parse_repl(df_item):
    for pattern, replacement in d_comp.items():
        df_item = pattern.sub(replacement, df_item)
    return df_item

d_comp is a compiled dictionary of regex items to replace.

I'm calling it like this:

df.apply(parse_repl)
df.to_csv(...)

I also tried apply with axis=0 and axis=1 and neither worked.

the error is this:

TypeError: ('expected string or bytes-like object', 'occurred at index myField')

Error happens in this line of the parse_repl function:

df_item = pattern.sub(replacement, df_item)

Presumably because sub expects a byte array.

The question is, how can I convert df_item to where it will work within the sub call, ie, change the item's data, and then return the changes back int the main DF intact?

Thanks!

Ryan Tam · Accepted Answer · 2018-05-16 22:29:56Z

First, I try to replicate your issue without a given example.

import pandas as pd
import re

df = pd.DataFrame({'x': ['a', 'b', 'c'], 'y': ['q', 'w', 'e']})

d_comp = {
    re.compile('a'): 'new_a',
    re.compile('q'): 'new_q',
}

def parse_repl(df_item):
    for pattern, replacement in d_comp.items():
        df_item = pattern.sub(replacement, df_item)
    return df_item

df.apply(parse_repl)

When using df.apply the function passed into it should take a series as arguement, so clearly pattern.sub(replacement, df_item) is not going to work as df_item is not a string nor byte here, it is a series.

You can try fixing your parse_repl to achieve your goal, but I recommend using something like below

In [1]:     import pandas as pd
   ...:     from IPython.display import display
   ...: 
   ...:     df = pd.DataFrame({'x': ['a', 'b', 'c'], 'y': ['q', 'w', 'e']})
   ...:     display('Original')
   ...:     display(df)
   ...: 
   ...:     regex_to_replace = {
   ...:         'a': 'new_a',
   ...:         'q': 'new_q',
   ...:     }
   ...: 
   ...:     for column_name in df:
   ...:         column = df[column_name]
   ...:         for regex_patten, replacement in regex_to_replace.items():
   ...:             column = column.str.replace(regex_patten, replacement)
   ...: 
   ...:         df[column_name] = column
   ...: 
   ...:     display('Replaced')
   ...:     display(df)
   ...: 
   ...: 
'Original'
   x  y
0  a  q
1  b  w
2  c  e
'Replaced'
       x      y
0  new_a  new_q
1      b      w
2      c      e

Collectives™ on Stack Overflow

Pandas full Dataframe Apply to regex function throws error: TypeError: 'expected string or bytes-like object'

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related