I have a huge csv file (~2GB) that I have imported using Dask. Now I want to normalize this dataframe. The dataframe contains about 70k columns. I have written this python function to calculate this:
def normalize(df):
result = df.copy()
for col in tqdm(df.columns):
if col!=str('name') #basically not to normalize columns with name of "name"
max_value = df[col].max()
min_value = df[col].min()
result[col] = (df[col] - min_value) / (max_value - min_value)
return result
It works okay but takes a lot of time. I put this on execution and its showing it will take appoximately 88 hours to complete. I tried switching to sklearn's minmaxscaler() but it doesn't show any progress of normalization and I am afraid that it will also take quite a lot of time. Is there any other way to normalize all the columns (and ignore a few like I did in that if condition).