Python numpy - DepricationWarning: Passing 1d arrays as data is deprecated

Question

I am quite new to data science/python, and currently I am working on some deep learning algorithms where I would like to use one variable for both input and output data. I have 4 inputs and 1 output. I use the following structure:

 samples = np.zeros(nb_samples, dtype=[('input', float, 4), ('output', float, 1)] )

and get the following warning, when I StandardScale, the array:

 DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise 
 ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your
  data has a single feature or X.reshape(1, -1) if it contains a single sample.

I think the problem is that my structure looks like this:

 [ [x0,x1,x2,x3], y0 ]

And it should look something like this:

 [ [x0,x1,x2,x3], [y0] ]

I found some similar questions but none of the answers worked for me.

How can I solve this warning? And what is the exact problem?

Maxim · Accepted Answer · 2018-04-19 20:13:47Z

3

I'm using 0.19.1 and indeed I get an error when I try to scale this array. But here's the transformation that works for me:

samples = np.zeros(nb_samples, dtype=[('input', float, 4), ('output', float, 1)])
x = samples['input']     # shape=(nb_samples, 4)
y = samples['output']    # shape=(nb_samples,)

scaler = StandardScaler()
scaler.fit_transform(x, y)  # does the same with and without `y`

Separation of input and output is better particularly for StandardScaler, because it scales only x and doesn't do anything to y. In fact, the y is a "passthrough argument for Pipeline compatibility". If you ignore this warning and transform samples directly, the output will be modified too and that's not what you want.

answered Apr 19, 2018 at 20:13

Maxim

53.9k27 gold badges161 silver badges213 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ajlaj25 Over a year ago

Thank you, I was looking for a solution something like that. :) I definitely want to scale the 'output', as I am working on a car value predictor application and there are prices like 50.000.000HUF and even more than that. The error would be much higher if I don't scale the output, right? Currently I am doing the following scaling: 'input_scaler = preprocessing.StandardScaler().fit(samples_train['input'])' ... 'output_scaler = preprocessing.StandardScaler().fit(samples_train['output'])' ... And after the prediction I use the inverse scaler to transform back. Should I change this?

Maxim Over a year ago

The absolute value of an error doesn't play a role. The problem with X scaling (lack of scaling to be exact) is that it makes certain algorithms and methods work poorly (SVM, KNN, NN). So it is a necessary step. Just to be clear: there's nothing wrong with extra scaling of y, but it doesn't add a lot of value. That is the reason why StandardScaler ignores it.

Collectives™ on Stack Overflow

Python numpy - DepricationWarning: Passing 1d arrays as data is deprecated

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related