Exception Handling in Pandas .apply() function

Question

If I have a DataFrame:

myDF = DataFrame(data=[[11,11],[22,'2A'],[33,33]], columns = ['A','B'])

Gives the following dataframe (Starting out on stackoverflow and don't have enough reputation for an image of the DataFrame)

   | A  | B  |

0  | 11 | 11 |

1  | 22 | 2A |

2  | 33 | 33 |

If i want to convert column B to int values and drop values that can't be converted I have to do:

def convertToInt(cell):
    try:
        return int(cell)
    except:
        return None
myDF['B'] = myDF['B'].apply(convertToInt)

If I only do:

myDF['B'].apply(int)

the error obviously is:

C:\WinPython-32bit-2.7.5.3\python-2.7.5\lib\site-packages\pandas\lib.pyd in pandas.lib.map_infer (pandas\lib.c:42840)()

ValueError: invalid literal for int() with base 10: '2A'

Is there a way to add exception handling to myDF['B'].apply()

Thank you in advance!

atkat12 · Accepted Answer · 2017-02-22 15:39:01Z

66

I had the same question, but for a more general case where it was hard to tell if the function would generate an exception (i.e. you couldn't explicitly check this condition with something as straightforward as isdigit).

After thinking about it for a while, I came up with the solution of embedding the try/except syntax in a separate function. I'm posting a toy example in case it helps anyone.

import pandas as pd
import numpy as np

x=pd.DataFrame(np.array([['a','a'], [1,2]]))

def augment(x):
    try:
        return int(x)+1
    except:
        return 'error:' + str(x)

x[0].apply(lambda x: augment(x))

answered Feb 22, 2017 at 15:39

atkat12

4,1807 gold badges24 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

janh Over a year ago

I think this answers the question, whilst the accepted answer solves the problem in a different way.

Noumenon Over a year ago

I got an unnecessary lambda warning from Pylint for this, so I just used x[0].apply(augment) and it gets passed what it needs.

Amit · Accepted Answer · 2014-04-03 19:54:33Z

21

A way to achieve that with lambda:

myDF['B'].apply(lambda x: int(x) if str(x).isdigit() else None)

For your input:

>>> myDF
    A   B
0  11  11
1  22  2A
2  33  33

[3 rows x 2 columns]

>>> myDF['B'].apply(lambda x: int(x) if str(x).isdigit() else None)
0    11
1   NaN
2    33
Name: B, dtype: float64

answered Apr 3, 2014 at 19:54

Amit

20.6k7 gold badges51 silver badges55 bronze badges

4 Comments

Paul H Over a year ago

@RukTech: just want to clarify that the dtype is float64 b/c there is no integer version of NaN

Amit Over a year ago

Or use 'None' instead of None in the else clause.

RukTech Over a year ago

@Paul: It is float64, my main aim is to convert from object to numeric type. Good catch though

June Over a year ago

If I don't know what the error is, how could I handle the exception?

Jeff · Accepted Answer · 2014-04-03 22:59:56Z

15

much better/faster to do:

In [1]: myDF = DataFrame(data=[[11,11],[22,'2A'],[33,33]], columns = ['A','B'])

In [2]: myDF.convert_objects(convert_numeric=True)
Out[2]: 
    A   B
0  11  11
1  22 NaN
2  33  33

[3 rows x 2 columns]

In [3]: myDF.convert_objects(convert_numeric=True).dtypes
Out[3]: 
A      int64
B    float64
dtype: object

This is a vectorized method of doing just this. The coerce flag say to mark as nan anything that cannot be converted to numeric.

You can of course do this to a single column if you'd like.

edited Apr 3, 2014 at 22:59

answered Apr 3, 2014 at 20:20

Jeff

130k21 gold badges223 silver badges189 bronze badges

1 Comment

Ram Narasimhan Over a year ago

Please note that convert_objects() is deprecated from Pandas 0.21.0

Collectives™ on Stack Overflow

Exception Handling in Pandas .apply() function

3 Answers 3

2 Comments

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related