Scaling down high dimensional pandas' data frame data using sklean

Question

I am trying to scale down values in pandas data frame. The problem is that I have 291 dimensions, so scale down the values one by one is time consuming if we are to do it as follows:

from sklearn.preprocessing import StandardScaler
sclaer = StandardScaler()
scaler = sclaer.fit(dataframe['dimension_1'])
dataframe['dimension_1'] = scaler.transform(dataframe['dimension_1'])

Problem: This is only for one dimension, so how we can do this please for the 291 dimension in one shot?

@yudhiesh. Thank you. No I mean scale down the values using StandardScaler. I am not looking for DR. — Avv
– Avv, Commented Jul 3, 2021 at 13:07
You can pass in a list of the columns that you want to apply scaling to. — yudhiesh
– yudhiesh, Commented Jul 3, 2021 at 13:08

yudhiesh · Accepted Answer · 2021-07-03 13:57:00Z

2

You can pass in a list of the columns that you want to scale instead of individually scaling each column.

# convert the columns labelled 0 and 1 to boolean values 
df.replace({0: False, 1: True}, inplace=True)

# make a copy of dataframe
scaled_features = df.copy()

# take the numeric columns i.e. those which are not of type object or bool
col_names = df.dtypes[df.dtypes != 'object'][df.dtypes != 'bool'].index.to_list()
features = scaled_features[col_names]

# Use scaler of choice; here Standard scaler is used
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)

scaled_features[col_names] = features

edited Jul 3, 2021 at 13:57

answered Jul 3, 2021 at 13:11

yudhiesh

6,8774 gold badges25 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

yudhiesh Over a year ago

Are they all side by side? Like from column 1 to 100 are the columns that you need? df.columns.tolist() will give you all the columns in the dataframe, so you could just filter the ones you do not want from it.

yudhiesh Over a year ago

You wouldn't want to standardize the columns with 0s and 1s as those are binary classes. I added in an updated line that gets all the numeric columns.

yudhiesh Over a year ago

No I am not aware of that maybe if you used some sort of managed service from AWS they would do it for you or with AutoML.

yudhiesh Over a year ago

@Abraheem this gets all the numeric columns, columns that have type of object are categorical.

yudhiesh Over a year ago

@Abraheem ok the updated answer should solve it, I converted the 0 and 1 values to booleans and then filtered to exclude cols of type bool and object.

|

Peerasak Intarapaiboon · Accepted Answer · 2021-07-03 21:51:20Z

1

I normally use pipeline, since it can do multi-step transformation.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
num_pipeline = Pipeline([('std_scale', StandardScaler())])
transformed_dataframe = num_pipeline.fit_transform(dataframe)

If you need to do more for transformation, e.g. fill NA, you just add in the list (Line 3 of the code).

Note: The above code works, if the datatype of all columns is numeric. If not we need to

select only numeric columns
pass into the pipeline, then
put the result back to the original dataframe.

Here is the code for the 3 steps:

num_col = dataframe.dtypes[df.dtypes != 'object'][dataframe.dtypes != 'bool'].index.to_list()
df_num = dataframe[num_col] #1
transformed_df = num_pipeline.fit_transform(dataframe) #2 
dataframe[num_col] = transformed_df #3

edited Jul 3, 2021 at 21:51

answered Jul 3, 2021 at 13:20

Peerasak Intarapaiboon

443 bronze badges

4 Comments

Avv Over a year ago

Thanks. Should we pass dataframe values or the whole dataframe to fit_transform as this did not work on my end. I got error ValueError: could not convert string to float: '[01]'

Peerasak Intarapaiboon Over a year ago

That problem may come from null value or some positions are string.

Avv Over a year ago

You are right! I had string values. I removed all of them. So, you did not add how to assign pack the scaled values to dataframe, can you please add that in your code? Also did your code exclude categorical values please?

Peerasak Intarapaiboon Over a year ago

I have added the code for exclude object columns.

Collectives™ on Stack Overflow

Scaling down high dimensional pandas' data frame data using sklean

2 Answers 2

12 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

12 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related