0

I am new to pandas in Jupyter and have inherited some very strange code. I have a data frame object with arbitrarily named columns most of which contain integers. In one of the cells there is

df = df/100

This seemingly divides every entry in the data frame by 100. Unfortunately some entries can be strings and this causes an error since you can't divide by 100. Does anyone know of a way to catch such an exception and to move on. I would like if the cell is an integer/double/float for the division to occur and if it is a string to do nothing. I was thinking of something like

    for (lambda x in df.columns):
        if x.type != "str":
           df[x] = df[x]/100 

I probably need to add a loop for the rows and use df.iloc or something, but really I am not sure the best way to do this but I am sure there is some cute way of accessing this information.

3 Answers 3

1

Your description of "doing nothing" was kind of vague: do you want to keep the original value or designate them as NA? Also, does each column have a single data type, or there are mixed types?

Here's one solution:

# Mock data
df = pd.DataFrame({
    'col1': [1, 'Two', 3, 'Four'],
    'col2': ['Five', 6, 'Seven', 8]
})

# Try converting every column to numeric before the division
# If the operation cannot be carried out, assign NaN
tmp = df.apply(pd.to_numeric, errors='coerce')  / 100

# Replace NaN cells with the original values from df
result = tmp.where(tmp.notnull(), df)
Sign up to request clarification or add additional context in comments.

Comments

0

Use a try/except statement. This allows you to do something unless and error is raised, and then specify what to do in that case. Eg:

for col in df.columns):
     try:
          df[x] = df[x]/100
     except TypeError:
          pass

Comments

0

You could have a function to operate on each cell in a row:

def f(*row): 
    to_return = [] 
    for cell in row: 
        try: 
            to_return.append(cell / 100) 
        except TypeError: 
             to_return.append(cell) 
    return to_return 

Then to apply that function to each row:

new_df = pd.DataFrame([f(*row) for row in 
                       df[[col for col in df.columns]].values],
                      columns=df.columns)  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.