1

I am quite new to data science/python, and currently I am working on some deep learning algorithms where I would like to use one variable for both input and output data. I have 4 inputs and 1 output. I use the following structure:

 samples = np.zeros(nb_samples, dtype=[('input', float, 4), ('output', float, 1)] )

and get the following warning, when I StandardScale, the array:

 DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise 
 ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your
  data has a single feature or X.reshape(1, -1) if it contains a single sample.

I think the problem is that my structure looks like this:

 [ [x0,x1,x2,x3], y0 ]

And it should look something like this:

 [ [x0,x1,x2,x3], [y0] ]

I found some similar questions but none of the answers worked for me.

How can I solve this warning? And what is the exact problem?

0

1 Answer 1

3

I'm using 0.19.1 and indeed I get an error when I try to scale this array. But here's the transformation that works for me:

samples = np.zeros(nb_samples, dtype=[('input', float, 4), ('output', float, 1)])
x = samples['input']     # shape=(nb_samples, 4)
y = samples['output']    # shape=(nb_samples,)

scaler = StandardScaler()
scaler.fit_transform(x, y)  # does the same with and without `y`

Separation of input and output is better particularly for StandardScaler, because it scales only x and doesn't do anything to y. In fact, the y is a "passthrough argument for Pipeline compatibility". If you ignore this warning and transform samples directly, the output will be modified too and that's not what you want.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, I was looking for a solution something like that. :) I definitely want to scale the 'output', as I am working on a car value predictor application and there are prices like 50.000.000HUF and even more than that. The error would be much higher if I don't scale the output, right? Currently I am doing the following scaling: 'input_scaler = preprocessing.StandardScaler().fit(samples_train['input'])' ... 'output_scaler = preprocessing.StandardScaler().fit(samples_train['output'])' ... And after the prediction I use the inverse scaler to transform back. Should I change this?
The absolute value of an error doesn't play a role. The problem with X scaling (lack of scaling to be exact) is that it makes certain algorithms and methods work poorly (SVM, KNN, NN). So it is a necessary step. Just to be clear: there's nothing wrong with extra scaling of y, but it doesn't add a lot of value. That is the reason why StandardScaler ignores it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.