1

I've found quite a few examples of fitting a linear regression with zero intercept.

However, I would like to fit a linear regression with a fixed x-intercept. In other words, the regression will start at a specific x.

I have the following code for plotting.

import numpy as np
import matplotlib.pyplot as plt

xs = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0,
              20.0, 40.0, 60.0, 80.0])


ys = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
              3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
              11.073788414382639, 23.248479770546009, 32.120462301367183,
              44.036117671229206, 54.009003143831116, 102.7077685684846,
              185.72880217806673, 256.12183145545811, 301.97120103079675])


def best_fit_slope_and_intercept(xs, ys):
    # m = xs.dot(ys)/xs.dot(xs)
    m = (((np.average(xs)*np.average(ys)) - np.average(xs*ys)) /
         ((np.average(xs)*np.average(xs)) - np.average(xs*xs)))
    b = np.average(ys) - m*np.average(xs)
    return m, b


def rSquaredValue(ys_orig, ys_line):
    def sqrdError(ys_orig, ys_line):
        return np.sum((ys_line - ys_orig) * (ys_line - ys_orig))
    yMeanLine = np.average(ys_orig)
    sqrtErrorRegr = sqrdError(ys_orig, ys_line)
    sqrtErrorYMean = sqrdError(ys_orig, yMeanLine)
    return 1 - (sqrtErrorRegr/sqrtErrorYMean)


m, b = best_fit_slope_and_intercept(xs, ys)
regression_line = m*xs+b

r_squared = rSquaredValue(ys, regression_line)
print(r_squared)

plt.plot(xs, ys, 'bo')
# Normal best fit
plt.plot(xs, m*xs+b, 'r-')
# Zero intercept
plt.plot(xs, m*xs, 'g-')
plt.show()

And I want something like the follwing where the regression line starts at (5, 0). enter image description here

Thank You. Any and all help is appreciated.

4 Answers 4

1

I been thinking for some time and I've found a possible workaround to the problem.

If I understood well, you want to find slope and intercept of the linear regression model with a fixed x-axis intercept.

Providing that's the case (imagine you want the x-axis intercept to take the value forced_intercept), it's as if you "moved" all the points -forced_intercept times in the x-axis, and then you forced scikit-learn to use y-axis intercept equal 0. You would then have the slope. To find the intercept just isolate b from y=ax+b and force the point (forced_intercept,0). When you do that, you get to b=-a*forced_intercept (where a is the slope). In code (notice xs reshaping):

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

xs = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0,
              20.0, 40.0, 60.0, 80.0]).reshape((-1,1)) #notice you must reshape your array or you will get a ValueError error from NumPy.


ys = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
              3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
              11.073788414382639, 23.248479770546009, 32.120462301367183,
              44.036117671229206, 54.009003143831116, 102.7077685684846,
              185.72880217806673, 256.12183145545811, 301.97120103079675])

forced_intercept = 5 #as you provided in your example of (5,0)

new_xs = xs - forced_intercept #here we "move" all the points
model = LinearRegression(fit_intercept=False).fit(new_xs, ys) #force an intercept of 0
r = model.score(new_xs,ys)
a = model.coef_

b = -1 * a * forced_intercept #here we find the slope so that the line contains (forced intercept,0)

print(r,a,b)
plt.plot(xs,ys,'o')
plt.plot(xs,a*xs+b)
plt.show()

Hope this is what you were looking for.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you for the response. Just for my understanding, if we start at (5,0) shouldn't the line be more lenient to left because there are more data points to the left. The code above only seems to have shifted the linear regression to right.
Also the linear regression fails with the following dataset: xs = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 11.0, 12.0, 13.0, 30.0]).reshape((-1, 1)) and ys = np.array([20., 25., 10., 3., 300., 6., 200., 210., 220., 230., 240., 250., 300., 310., 320.])
It's not that we just shifted, we shifted and forced the regression to go over the origin (which was (forced_intercept,0)), and then "unshifted". Just shifting would be what we've done without forcing it to go through the origin (fit_intercept = True).
I understand and that's not exactly what I meant, but please have a look at the dataset of the previous comment - The line starts at the right place but isn't affected by the weights of the data.
Yes, I see the probem. Sorry I didn't answer I was thinking about it. I understand what you mean and you have all the reason. If I find what fails I'll tell you.
1

May be this approach will be useful.

import numpy as np
import matplotlib.pyplot as plt

xs = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0,
              20.0, 40.0, 60.0, 80.0])

ys = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
              3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
              11.073788414382639, 23.248479770546009, 32.120462301367183,
              44.036117671229206, 54.009003143831116, 102.7077685684846,
              185.72880217806673, 256.12183145545811, 301.97120103079675])

# At first we add this anchor point to the points set.
xs = np.append(xs, [5.])
ys = np.append(ys, [0.])

# Then we prepare the coefficient matrix according docs
# https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html
A = np.vstack([xs, np.ones(len(xs))]).T

# Then we prepare weights for these points. And we put all weights
# equal except the last one (for added anchor point).
# In this example it's weight 1000 times larger in comparison with others.
W = np.diag(np.ones([len(xs)]))
W[-1,-1] = 1000.

# And we find least-squares solution.
m, c = np.linalg.lstsq(np.dot(W, A), np.dot(W, ys), rcond=None)[0]

plt.plot(xs, ys, 'o', label='Original data', markersize=10)
plt.plot(xs, m * xs + c, 'r', label='Fitted line')
plt.show()

enter image description here

3 Comments

Thank you for the response. Just for my understanding, if we start at (5,0) shouldn't the line be more lenient to left because there is more data points to the left.
The code above only seems to have shifted the linear regression to right.
Also the linear regresssion fails with the following dataset: xs = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 11.0, 12.0, 13.0, 30.0]) and ys = np.array([20., 25., 10., 3., 300., 6., 200., 210., 220., 230., 240., 250., 300., 310., 320.])
0

If you used scikit-learn for linear regression task, it's possible to define intercept(s) using intercept_ attribute.

Comments

0
from matplotlib import pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

X = np.linspace(0,10, 100)
Y = X + np.random.randn(100) + 3.5
lin = lambda x, a: a * x + 3.5
slope = curve_fit(lin, X, Y)[0][0]

plt.plot(X, Y, X, [slope * x + 3.5 for x in X])

plot

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.