2

I am trying to scale down values in pandas data frame. The problem is that I have 291 dimensions, so scale down the values one by one is time consuming if we are to do it as follows:

from sklearn.preprocessing import StandardScaler
sclaer = StandardScaler()
scaler = sclaer.fit(dataframe['dimension_1'])
dataframe['dimension_1'] = scaler.transform(dataframe['dimension_1'])

Problem: This is only for one dimension, so how we can do this please for the 291 dimension in one shot?

4
  • I think you mean dimension reduction? Look at PCA for this. Commented Jul 3, 2021 at 13:06
  • @yudhiesh. Thank you. No I mean scale down the values using StandardScaler. I am not looking for DR. Commented Jul 3, 2021 at 13:07
  • You can pass in a list of the columns that you want to apply scaling to. Commented Jul 3, 2021 at 13:08
  • @yudhiesh. Can you please post an answer? Commented Jul 3, 2021 at 13:08

2 Answers 2

2

You can pass in a list of the columns that you want to scale instead of individually scaling each column.

# convert the columns labelled 0 and 1 to boolean values 
df.replace({0: False, 1: True}, inplace=True)

# make a copy of dataframe
scaled_features = df.copy()

# take the numeric columns i.e. those which are not of type object or bool
col_names = df.dtypes[df.dtypes != 'object'][df.dtypes != 'bool'].index.to_list()
features = scaled_features[col_names]

# Use scaler of choice; here Standard scaler is used
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)

scaled_features[col_names] = features
Sign up to request clarification or add additional context in comments.

12 Comments

Are they all side by side? Like from column 1 to 100 are the columns that you need? df.columns.tolist() will give you all the columns in the dataframe, so you could just filter the ones you do not want from it.
You wouldn't want to standardize the columns with 0s and 1s as those are binary classes. I added in an updated line that gets all the numeric columns.
No I am not aware of that maybe if you used some sort of managed service from AWS they would do it for you or with AutoML.
@Abraheem this gets all the numeric columns, columns that have type of object are categorical.
@Abraheem ok the updated answer should solve it, I converted the 0 and 1 values to booleans and then filtered to exclude cols of type bool and object.
|
1

I normally use pipeline, since it can do multi-step transformation.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
num_pipeline = Pipeline([('std_scale', StandardScaler())])
transformed_dataframe = num_pipeline.fit_transform(dataframe)

If you need to do more for transformation, e.g. fill NA, you just add in the list (Line 3 of the code).

Note: The above code works, if the datatype of all columns is numeric. If not we need to

  1. select only numeric columns
  2. pass into the pipeline, then
  3. put the result back to the original dataframe.

Here is the code for the 3 steps:

num_col = dataframe.dtypes[df.dtypes != 'object'][dataframe.dtypes != 'bool'].index.to_list()
df_num = dataframe[num_col] #1
transformed_df = num_pipeline.fit_transform(dataframe) #2 
dataframe[num_col] = transformed_df #3

4 Comments

Thanks. Should we pass dataframe values or the whole dataframe to fit_transform as this did not work on my end. I got error ValueError: could not convert string to float: '[01]'
That problem may come from null value or some positions are string.
You are right! I had string values. I removed all of them. So, you did not add how to assign pack the scaled values to dataframe, can you please add that in your code? Also did your code exclude categorical values please?
I have added the code for exclude object columns.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.