how to improve for loop in python

Question

I have this code:

    for row in range(len(df[col])):
        df[col][row] = int(df[col][row].replace(',','')) 
    df[col] = df[col].astype(int)
    df[col] = np.round(df[col]/500)*500  #rounds the numbers to the closest 500 multiple.
    df[col] = df[col].astype(int) #round returns a float, this turns it back to int after rounding

In the for loop the: df[col][row].replace(',','') basically removes commas from numbers that are stored as objects like 1,430 and then converts it to int like 1430

Then I'm having to add the df[col] = df[col].astype(int) because otherwise, the following np.round() throws the error: 'float' object has no attribute 'rint'

The thing is that after the np.round() I'm having to add again the .astype(int) because the round as I have it is returning a float, but I want ints.

I'm seeing that the execution of this is considerably long, even thought my dataframe is only 32 x 17

is there anyway I could improve it??

hi there welcome to SO, please see How to Ask and minimal reproducible example - also you don't need the loop do - df[col].replace(',','').astype(int) but unsure what you're trying to entirely — Umar.H
– Umar.H, Commented Aug 6, 2020 at 15:10
Does this answer your question? Convert Pandas Dataframe to Float with commas and negative numbers, and then just use astype(int) — MrNobody33
– MrNobody33, Commented Aug 6, 2020 at 15:20

Jonathan · Accepted Answer · 2020-08-06 15:17:33Z

0

Would a more general replace using a lambda function df[col].apply(lambda x: x.str.replace(',','')) be more suitable and time efficient?

And would a one liner like this not yield what you are after?

df['col'] = (df['col'] / 500).astype(int) * 500

answered Aug 6, 2020 at 15:17

Jonathan

9237 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MrNobody33 Over a year ago

it's not neccesary the usage of apply when specifying the col, you can just use df[col].str.replace(',','') ;)

StefanMZ · Accepted Answer · 2020-08-06 15:20:02Z

0

Don't do that for row in range(len(df[col])): do this: for row in df[col]

or instead of that for use this:

Use this for actually replacing string with another string: DataFrame.replace

or better use a lambda: DataFrame.apply (Example here)

answered Aug 6, 2020 at 15:20

StefanMZ

4831 gold badge4 silver badges11 bronze badges

Collectives™ on Stack Overflow

how to improve for loop in python

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related