Gaussian Process scikit-learn - Exception

Question

I want to use Gaussian Processes to solve a regression task. My data is as follow : each X vector has a length of 37, and each Y vector has a length of 8.

I'm using the sklearnpackage in Python but trying to use gaussian processes leads to an Exception:

from sklearn import gaussian_process

print "x :", x__
print "y :", y__

gp = gaussian_process.GaussianProcess(theta0=1e-2, thetaL=1e-4, thetaU=1e-1)
gp.fit(x__, y__)

x : [[ 136. 137. 137. 132. 130. 130. 132. 133. 134.
135. 135. 134. 134. 1139. 1019. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 70. 24. 55. 0. 9. 0. 0.] [ 136. 137. 137. 132. 130. 130. 132. 133. 134. 135. 135. 134. 134. 1139. 1019. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 70. 24. 55. 0. 9. 0. 0.] [ 82. 76. 80. 103. 135. 155. 159. 156. 145. 138. 130. 122. 122. 689. 569. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 156. 145. 138. 130. 122. 118. 113. 111. 105. 101. 98. 95. 95. 759. 639. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 112. 111. 111. 114. 114. 113. 114. 114. 112. 111. 109. 109. 109. 1109. 989. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 133. 130. 125. 124. 124. 123. 103. 87. 96. 121. 122. 123. 123. 399. 279. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 104. 109. 111. 106. 91. 86. 117. 123. 123. 120. 121. 115. 115. 549. 429. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 144. 138. 126. 122. 119. 118. 116. 114. 107. 105. 106. 119. 119. 479. 359. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

y : [[ 7. 9. 13. 30. 34. 37. 36. 41. ] [ 7. 9. 13. 30. 34. 37. 36. 41. ] [ -4. -9. -17. -21. -27. -28. -28. -20. ] [ -1. -1. -4. -5. 20. 28. 31. 23. ] [ -1. -2. -3. -1. -4. -7. 8. 58. ] [ -1. -2. -14.33333333 -14. -13.66666667 -32. -26.66666667 -1. ] [ 1. 3.33333333 0. -0.66666667 3. 6. 22. 54. ] [ -2. -8. -11. -17. -17. -16. -16. -23. ]]

--------------------------------------------------------------------------- Exception Traceback (most recent call last) in () 11 gp = gaussian_process.GaussianProcess(theta0=1e-2, thetaL=1e-4, thetaU=1e-1) 12 ---> 13 gp.fit(x__, y__)

/usr/local/lib/python2.7/site-packages/sklearn/gaussian_process/gaussian_process.pyc in fit(self, X, y) 300 if (np.min(np.sum(D, axis=1)) == 0. 301 and self.corr != correlation.pure_nugget): --> 302 raise Exception("Multiple input features cannot have the same" 303 " target value.") 304

Exception: Multiple input features cannot have the same target value.

I've found some topics related to a scikit-learn issue, but my version is up-to-date.

As per the suggestion in the issue, did you try to comment out line 307 in gaussian_process.py? — erip
– erip, Commented Jan 11, 2016 at 14:22

erip · Accepted Answer · 2016-01-11 15:31:49Z

7

It is known issue and it still has not actually been resolved.

It is happens, because if you have same points , your matrix is not invertible(singular).(meaning you cannot calculate A^-1 - which is part of solution for GP).

In order to solve it, just add some small gaussian noise to your examples or use other GP library.

You can always try to implement it, it is actually not that hard. The most important thing in GP is your kernel function, for example gaussian kernel:

exponential_kernel = lambda x, y, params: params[0] * \
    np.exp( -0.5 * params[1] * np.sum((x - y)**2) )

Now, we need to build covariance matrix, like this:

covariance = lambda kernel, x, y, params: \
    np.array([[kernel(xi, yi, params) for xi in x] for yi in y])

So, when you want to predict new point x calculate its covariance:

sigma1 = covariance(exponential_kernel, x, x, theta)

and apply following:

def predict(x, data, kernel, params, sigma, t):
    k = [kernel(x, y, params) for y in data]
    Sinv = np.linalg.inv(sigma)
    y_pred = np.dot(k, Sinv).dot(t)
    sigma_new = kernel(x, x, params) - np.dot(k, Sinv).dot(k)
    return y_pred, sigma_new

This is very naive implementation and for data with high dimensions, runtime will be high. Hardest thing to calculate here is Sinv = np.linalg.inv(sigma) which takes O(N^3).

edited Jan 11, 2016 at 15:31

erip

17.1k11 gold badges73 silver badges131 bronze badges

answered Jan 11, 2016 at 14:27

Farseer

4,1924 gold badges46 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Julian Over a year ago

Thanks ! I've been trying to use GPy but I failed. With this trick it now works.

erip Over a year ago

A side note - this seems like a good suggestion for resolving the issue in the library. I recommend suggesting this if it hasn't been done yet. +1 for the mathematical suggestion instead of the kludge I suggested. :)

Irene Over a year ago

I am having exactly the same problem, but I am not really sure how to use this very nice trick in my case. I have the following 3 lines: gp = GaussianProcess(theta0=0.3, nugget=2.4) gp.fit(xtrain, ytrain) y_pred, sigma = gp.predict(xtest, eval_MSE=True) Then, who is who? How the "fit" is carried out using Farseer's suggestion? So I calculate the "exponential_kernel", the "covariance" and "sigma" and call predict, but how the fitting/training is done? Thanks

Little Bobby Tables Over a year ago

It's helpful to see you explaining the heavy run time part and giving its order.

Collectives™ on Stack Overflow

Gaussian Process scikit-learn - Exception

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related