Python + dataframe : AttributeError: 'float' object has no attribute 'replace'

Question

I am trying to write a function to do some text processing on the specified columns (description, event_name) of a Pandas dataframe. I wrote this code:

#removal of unreadable chars, unwanted spaces, words of at most length two from 'description' column and lowercase the 'description' column

def data_preprocessing(source):

    return source.replace('[^A-Za-z]',' ')
    #data['description'] = data['description'].str.replace('\W+',' ')
    return source.lower()
    return source.replace("\s\s+" , " ")
    return source.replace('\s+[a-z]{1,2}(?!\S)',' ')
    return source.replace("\s\s+" , " ")

data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))

It is giving the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-94-cb5ec147833f> in <module>()
----> 1 data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
      2 data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
      3 
      4 #df['words']=df['words'].apply(lambda row: eliminate_space(row))
      5 

~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   2549             else:
   2550                 values = self.asobject
-> 2551                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2552 
   2553         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-94-cb5ec147833f> in <lambda>(row)
----> 1 data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
      2 data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
      data['description'] = data['description'].str.replace('\W+',' ')    
<ipython-input-93-fdfec5f52a06> in data_preprocessing(source)
      3 def data_preprocessing(source):
      4 
----> 5     return source.replace('[^A-Za-z]',' ')
      6     #data['description'] = data['description'].str.replace('\W+',' ')
      7     source = source.lower()

AttributeError: 'float' object has no attribute 'replace'

If I write the code in following way, without function, it works perfectly:

data['description'] = data['description'].str.replace('[^A-Za-z]',' ')

Try data['description'] = data['description'].astype(str).apply(lambda row: data_preprocessing(row)) ? — Chris Adams
– Chris Adams, Commented Oct 1, 2018 at 18:00

Peter Leimbigler · Accepted Answer · 2018-10-01 18:12:05Z

5

Two things to fix:

First, when you apply a lambda function to a pandas Series, the lambda function is applied to each element of the Series. What I think you need is to apply your function to the entire Series in a vectorized manner.

Second, your function has multiple return statements. As a result, only the first statement, return source.replace('[^A-Za-z]',' '), will ever run. What you need to do is make your preprocessing changes on the variable source inside your function, and finally return the modified source (or an intermediate variable) at the very end.

To rewrite your function to operate on an entire pandas Series, replace every occurrence of source. with source.str.. The new function definition:

def data_preprocessing(source):
    source = source.str.replace('[^A-Za-z]',' ')
    #data['description'] = data['description'].str.replace('\W+',' ')
    source = source.str.lower()
    source = source.str.replace("\s\s+" , " ")
    source = source.str.replace('\s+[a-z]{1,2}(?!\S)',' ')
    source = source.str.replace("\s\s+" , " ")
    return source

Then, instead of this:

data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))

Try this:

data['description'] = data_preprocessing(data['description'])
data['event_name'] = data_preprocessing(data['event_name'])

answered Oct 1, 2018 at 18:12

Peter Leimbigler

11.1k1 gold badge27 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Debbie Over a year ago

Thanks Man. Your answer is perfect. :)

Collectives™ on Stack Overflow

Python + dataframe : AttributeError: 'float' object has no attribute 'replace'

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related