7

I need to implement GPR (Gaussian process regression) in Python using the scikit-learn library.

My input X has two features. Ex. X=[x1, x2]. And output is one dimension y=[y1]

I want to use two Kernels; RBF and Matern, such that RBF uses the 'x1' feature while Matern use the 'x2' feature. I tried the following:

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern as M, RBF as R

X = np.matrix([[1.,2], [3.,4], [5.,1], [6.,5],[4, 7.],[ 9,8.], [1.,2], [3.,4], [5.,1], [6.,5],[4, 7.],[ 9,8.],[1.,2], [3.,4], [5.,1], [6.,5],[4, 7.],[ 9,8.]]).T

y=[0.84147098,  0.42336002, -4.79462137, -1.67649299,  4.59890619,  7.91486597, 0.84147098,  0.42336002, -4.79462137, -1.67649299,  4.59890619,  7.91486597, 0.84147098,  0.42336002, -4.79462137, -1.67649299,  4.59890619,  7.91486597]

kernel = R(X[0]) * M(X[1])
gp = GaussianProcessRegressor(kernel=kernel)

gp.fit(X, y)

But this gives an error

ValueError: Found input variables with inconsistent numbers of samples: [2, 18]

I tried several methods but could not find a solution. Really appreciate if someone can help.

4
  • Please make your example fully reproducible by explicitly including the relevant imports Commented Jun 6, 2018 at 23:57
  • Thanks for the feedback. I edited the post. Appreciate if you can shed some light Commented Jun 7, 2018 at 0:12
  • What is the new x = np.atleast_2d(np.linspace(0, 10, 1000)).T?? Commented Jun 7, 2018 at 0:16
  • Just now deleted it. That's for the prediction part. Right now I am just trying to fit the data to GPR. Thanks u Commented Jun 7, 2018 at 0:19

1 Answer 1

4

Your X should not be a matrix, but an array of 2D elements:

X = np.array([[1.,2], [3.,4], [5.,1], [6.,5],[4, 7.],[ 9,8.], [1.,2], [3.,4], [5.,1], [6.,5],[4, 7.],[ 9,8.],[1.,2], [3.,4], [5.,1], [6.,5],[4, 7.],[ 9,8.]])

# rest of your code as is

gp.fit(X, y)

# result:

GaussianProcessRegressor(alpha=1e-10, copy_X_train=True,
             kernel=RBF(length_scale=[1, 2]) * Matern(length_scale=[3, 4], nu=1.5),
             n_restarts_optimizer=0, normalize_y=False,
             optimizer='fmin_l_bfgs_b', random_state=None)

That said, your kernel definition will not do what you want to do; most probably you have to change it to

kernel = R([1,0]) * M([0,1]) 

but I am not quite sure about that - be sure to check the documentation for the correct arguments of the RBF and Matern kernels...

Sign up to request clarification or add additional context in comments.

10 Comments

Thanks alot desertnaut. I am trying as you suggested. gp.fit works now.
My main issue is how to specify that; RBF takes the 1 st feature set and and Matern takes the second. Still that issue is not solved. In the documentation of RBF and Matern, I cant find a place to specify which coloumn of the input 2D array should be taken.
@QuantumGirl you cannot specify that directly, you'll have to do it indirectly via the length_scale argument. The idea behind my approach is that [1,0] means the 1st column and [0,1] means the second one (it's like indices); I am just not completely sure that this is the correct way...
Thanks desertnaut, I will try length_scale.
I understand. But as per my understanding length_scale is to specify a hyperparameter. Not the input dimension. This is my problem. Let see. If sci-kit is not flexible I can probably move to GPML.Thanks anyway :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.