Pandas manipulating data frames and exception handling

Question

I am new to pandas in Jupyter and have inherited some very strange code. I have a data frame object with arbitrarily named columns most of which contain integers. In one of the cells there is

df = df/100

This seemingly divides every entry in the data frame by 100. Unfortunately some entries can be strings and this causes an error since you can't divide by 100. Does anyone know of a way to catch such an exception and to move on. I would like if the cell is an integer/double/float for the division to occur and if it is a string to do nothing. I was thinking of something like

    for (lambda x in df.columns):
        if x.type != "str":
           df[x] = df[x]/100

I probably need to add a loop for the rows and use df.iloc or something, but really I am not sure the best way to do this but I am sure there is some cute way of accessing this information.

Code Different · Accepted Answer · 2020-04-06 22:02:21Z

1

Your description of "doing nothing" was kind of vague: do you want to keep the original value or designate them as NA? Also, does each column have a single data type, or there are mixed types?

Here's one solution:

# Mock data
df = pd.DataFrame({
    'col1': [1, 'Two', 3, 'Four'],
    'col2': ['Five', 6, 'Seven', 8]
})

# Try converting every column to numeric before the division
# If the operation cannot be carried out, assign NaN
tmp = df.apply(pd.to_numeric, errors='coerce')  / 100

# Replace NaN cells with the original values from df
result = tmp.where(tmp.notnull(), df)

answered Apr 6, 2020 at 22:02

Code Different

93.4k16 gold badges154 silver badges175 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

whege · Accepted Answer · 2020-04-06 22:02:00Z

0

Use a try/except statement. This allows you to do something unless and error is raised, and then specify what to do in that case. Eg:

for col in df.columns):
     try:
          df[x] = df[x]/100
     except TypeError:
          pass

answered Apr 6, 2020 at 22:02

whege

1,4411 gold badge9 silver badges14 bronze badges

Comments

mechanical_meat · Accepted Answer · 2020-04-06 22:10:44Z

0

You could have a function to operate on each cell in a row:

def f(*row): 
    to_return = [] 
    for cell in row: 
        try: 
            to_return.append(cell / 100) 
        except TypeError: 
             to_return.append(cell) 
    return to_return

Then to apply that function to each row:

new_df = pd.DataFrame([f(*row) for row in 
                       df[[col for col in df.columns]].values],
                      columns=df.columns)

edited Apr 6, 2020 at 22:10

answered Apr 6, 2020 at 21:51

mechanical_meat

170k25 gold badges237 silver badges231 bronze badges

Collectives™ on Stack Overflow

Pandas manipulating data frames and exception handling

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related