Unable to optimize function using pytorch

Question

I am trying to write an estimator for a Structural Equation Model. So basically I start with the random parameters for the model B, gamma, phi_diag, psi. And using this I compute the implied covariance matrix sigma. And my optimization function f_ml is computed based on the sigma and the covariance matrix of the data S. Here's my computation code:

device = torch.device('cpu')
dtype = torch.float

B_s = (4, 4)
gamma_s = (4, 1)
phi_s = (1, 1)
psi_s = (4, 4)

# Covariance matrix of data
S = torch.tensor(data.cov().values, dtype=dtype, device=device, requires_grad=False)

# Defining parameters of the model
B = torch.rand(*B_s, dtype=dtype, device=device, requires_grad=True)
B_lower = B.tril(diagonal=-1)

gamma = torch.rand(*gamma_s, dtype=dtype, device=device, requires_grad=True)

phi_diag = torch.rand(phi_s[0], dtype=dtype, device=device, requires_grad=True)
phi = torch.diag(phi_diag)

psi = torch.rand(*psi_s, dtype=dtype, device=device, requires_grad=True)
psi_sym = psi @ psi.t()

B_inv = (torch.eye(*B_s, dtype=dtype, device=device, requires_grad=False) - B_lower).inverse()
sigma_yy = B_inv @ (gamma @ phi @ gamma.t() + psi_sym) @ B_inv.t()
sigma_xy = phi @ gamma.t() @ B_inv.t()
sigma_yx = sigma_xy.t()
sigma_xx = phi

# Computing the covariance matrix from the parameters
sigma = torch.cat((torch.cat((sigma_yy, sigma_yx), 1), torch.cat((sigma_xy, sigma_xx), 1)), 0)

And I am trying to do the optimization as:

optim = torch.optim.Adam([B, gamma, phi_diag, psi], lr=0.01)
for t in range(5000):
    optim.zero_grad()
    f_ml = sigma.logdet() + (S @ sigma.inverse()).trace() - S.logdet() - (4 + 1)
    f_ml.backward(retain_graph=True)
    optim.step()

The problem I am facing is that the values of my parameters aren't updated during the optimization. I tried to debug the problem a bit and what I noticed is that in the first loop of optimization the gradients get calculated but the values of the parameters don't get updated. Here's an example using pdb (breakpoint set right after the for loop):

> <ipython-input-232-c6a6fda6610b>(14)<module>()
-> optim.zero_grad()
(Pdb) B
tensor([[ 6.0198e-01,  8.7188e-01,  5.4234e-01,  6.0800e-01],
        [-4.9971e+03,  9.3324e-01,  8.1482e-01,  8.3517e-01],
        [-1.4002e+04,  2.6706e+04,  2.6412e-01,  4.7804e-01],
        [ 1.1382e+04, -2.1603e+04, -6.0834e+04,  1.2768e-01]],
       requires_grad=True)
(Pdb) c
> <ipython-input-232-c6a6fda6610b>(13)<module>()
-> import pdb; pdb.set_trace()
(Pdb) B.grad
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 1.6332e+04,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 4.6349e+04, -8.8694e+04,  0.0000e+00,  0.0000e+00],
        [-3.7612e+04,  7.1684e+04,  2.0239e+05,  0.0000e+00]])
(Pdb) B
tensor([[ 6.0198e-01,  8.7188e-01,  5.4234e-01,  6.0800e-01],
        [-4.9971e+03,  9.3324e-01,  8.1482e-01,  8.3517e-01],
        [-1.4002e+04,  2.6706e+04,  2.6412e-01,  4.7804e-01],
        [ 1.1382e+04, -2.1603e+04, -6.0834e+04,  1.2768e-01]],
       requires_grad=True)

I can't figure out what am I doing wrong. Any ideas?

Ankur Ankan · Accepted Answer · 2019-01-17 10:45:28Z

1

The problem is that the value of sigma wasn't getting computed in each iteration. Basically, the computation code needs to be moved in a function and it needs to be computed in every iteration.

answered Jan 17, 2019 at 10:45

Ankur Ankan

3,0662 gold badges26 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Unable to optimize function using pytorch

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related