I want to calculate multiple linear regression with numpy. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.).
For example, with this data:
print 'y x1 x2 x3 x4 x5 x6 x7'
for t in texts:
print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
.format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)
(output for above:)
y x1 x2 x3 x4 x5 x6 x7
20.64, 0.0, 296, 54.7, 0, 519, 2, 24.0
25.12, 0.0, 387, 54.7, 1, 678, 2, 24.0
19.22, 0.0, 535, 54.7, 0, 296, 2, 24.0
18.99, 0.0, 519, 18.97, 0, 296, 2, 54.9
18.89, 0.0, 296, 18.97, 0, 535, 2, 54.9
25.51, 0.0, 678, 18.97, 1, 387, 2, 54.9
20.19, 0.0, 296, 25.51, 0, 519, 2, 54.9
20.75, 0.0, 535, 25.51, 0, 296, 2, 54.9
24.13, 0.0, 387, 25.51, 1, 678, 2, 54.9
19.24, 0.0, 519, 0, 0, 296, 2, 55.0
20.90, 0.0, 296, 0, 0, 535, 2, 55.0
25.30, 0.0, 678, 0, 1, 387, 2, 55.0
20.78, 0.0, 296, 0, 0, 519, 2, 55.2
23.01, 0.0, 535, 0, 0, 296, 2, 55.2
25.20, 0.0, 387, 0, 1, 678, 2, 55.2
19.12, 0.0, 519, 0, 0, 296, 2, 55.3
20.03, 0.0, 296, 0, 0, 535, 2, 55.3
25.22, 0.0, 678, 0, 1, 387, 2, 55.3
I have created this function that I think it gives the coefficients A from Y = a1x1 + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + +a7x7 + c.
def calculate_linear_regression_numpy(xx, yy):
""" calculate multiple linear regression """
import numpy as np
from numpy import linalg
A = np.column_stack((xx, np.ones(len(xx))))
coeffs = linalg.lstsq(A, yy)[0] # obtaining the parameters
return coeffs
xx is a list that contains each row of x's, and yy is a list that contains all y.
The A is this:
00 = {ndarray} [ 0. 296. 519. 2. 0. 24. 54.7 1. ]
01 = {ndarray} [ 0. 296. 535. 2. 0. 24. 54.7 1. ]
02 = {ndarray} [ 0. 387. 678. 2. 1. 24. 54.7 1. ]
03 = {ndarray} [ 0. 296. 519. 2. 0. 54.9 18.97957206 1. ]
04 = {ndarray} [ 0. 296. 535. 2. 0. 54.9 18.97957206 1. ]
05 = {ndarray} [ 0. 387. 678. 2. 1. 54.9 18.97957206 1. ]
06 = {ndarray} [ 0. 296. 519. 2. 0. 54.9 25.518085 1. ]
07 = {ndarray} [ 0. 296. 535. 2. 0. 54.9 25.518085 1. ]
08 = {ndarray} [ 0. 387. 678. 2. 1. 54.9 25.518085 1. ]
09 = {ndarray} [ 0. 296. 519. 2. 0. 55. 0. 1.]
10 = {ndarray} [ 0. 296. 535. 2. 0. 55. 0. 1.]
11 = {ndarray} [ 0. 387. 678. 2. 1. 55. 0. 1.]
12 = {ndarray} [ 0. 296. 519. 2. 0. 55.2 0. 1. ]
13 = {ndarray} [ 0. 296. 535. 2. 0. 55.2 0. 1. ]
14 = {ndarray} [ 0. 387. 678. 2. 1. 55.2 0. 1. ]
15 = {ndarray} [ 0. 296. 519. 2. 0. 55.3 0. 1. ]
16 = {ndarray} [ 0. 296. 535. 2. 0. 55.3 0. 1. ]
17 = {ndarray} [ 0. 387. 678. 2. 1. 55.3 0. 1. ]
And the np.dot(A,coeffs) is this:
[ 19.69873196 20.33871176 24.95249051 19.59198545
20.23196525 24.845744 19.41602911 20.05600891 24.66978766
20.09928377 20.73926357 25.35304232 20.09237109 20.73235089
25.34612964 20.08891474 20.72889454 25.34267329]
At the return of the function, the coeffs, contains this 8 values.
[0.0, -0.0010535377771944548, 0.039998737474281849, 0.62111016637058492, -1.0101687709958682, -0.034563440146209781, -0.026910757873959575, 0.31055508318529385]
I don't know if the coeffs[0] or the coeffs[7] is the c from the equation Y defined above.
I take this coeffs and I calculate the new Ŷ multiplying the coeffs with the new ẍ's, like this:
Ŷ=a1ẍ1 + a2ẍ2 + a3ẍ3 + a4ẍ4 + a5ẍ5 + a6ẍ6 + +a7ẍ7 + c
Am I calculating Ŷ correctly? And what should I do when I get a Ŷ with a negative number? Which term is the c (a[0] or a[7])?
cterm would bea[7]since you are putting the ones column at the end, but your coefficients doesn't make sense, you can check by doingprint np.dot(A,coeffs), it should give you yy, or very similar. When I tried I got the coefficients[ -0.49104607 0.83271938 0.0860167 0.1326091 6.85681762 22.98163883 -41.08437805 -19.08085066]A