0

I am trying to write an estimator for a Structural Equation Model. So basically I start with the random parameters for the model B, gamma, phi_diag, psi. And using this I compute the implied covariance matrix sigma. And my optimization function f_ml is computed based on the sigma and the covariance matrix of the data S. Here's my computation code:

device = torch.device('cpu')
dtype = torch.float

B_s = (4, 4)
gamma_s = (4, 1)
phi_s = (1, 1)
psi_s = (4, 4)

# Covariance matrix of data
S = torch.tensor(data.cov().values, dtype=dtype, device=device, requires_grad=False)

# Defining parameters of the model
B = torch.rand(*B_s, dtype=dtype, device=device, requires_grad=True)
B_lower = B.tril(diagonal=-1)

gamma = torch.rand(*gamma_s, dtype=dtype, device=device, requires_grad=True)

phi_diag = torch.rand(phi_s[0], dtype=dtype, device=device, requires_grad=True)
phi = torch.diag(phi_diag)

psi = torch.rand(*psi_s, dtype=dtype, device=device, requires_grad=True)
psi_sym = psi @ psi.t()

B_inv = (torch.eye(*B_s, dtype=dtype, device=device, requires_grad=False) - B_lower).inverse()
sigma_yy = B_inv @ (gamma @ phi @ gamma.t() + psi_sym) @ B_inv.t()
sigma_xy = phi @ gamma.t() @ B_inv.t()
sigma_yx = sigma_xy.t()
sigma_xx = phi

# Computing the covariance matrix from the parameters
sigma = torch.cat((torch.cat((sigma_yy, sigma_yx), 1), torch.cat((sigma_xy, sigma_xx), 1)), 0)

And I am trying to do the optimization as:

optim = torch.optim.Adam([B, gamma, phi_diag, psi], lr=0.01)
for t in range(5000):
    optim.zero_grad()
    f_ml = sigma.logdet() + (S @ sigma.inverse()).trace() - S.logdet() - (4 + 1)
    f_ml.backward(retain_graph=True)
    optim.step()

The problem I am facing is that the values of my parameters aren't updated during the optimization. I tried to debug the problem a bit and what I noticed is that in the first loop of optimization the gradients get calculated but the values of the parameters don't get updated. Here's an example using pdb (breakpoint set right after the for loop):

> <ipython-input-232-c6a6fda6610b>(14)<module>()
-> optim.zero_grad()
(Pdb) B
tensor([[ 6.0198e-01,  8.7188e-01,  5.4234e-01,  6.0800e-01],
        [-4.9971e+03,  9.3324e-01,  8.1482e-01,  8.3517e-01],
        [-1.4002e+04,  2.6706e+04,  2.6412e-01,  4.7804e-01],
        [ 1.1382e+04, -2.1603e+04, -6.0834e+04,  1.2768e-01]],
       requires_grad=True)
(Pdb) c
> <ipython-input-232-c6a6fda6610b>(13)<module>()
-> import pdb; pdb.set_trace()
(Pdb) B.grad
tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 1.6332e+04,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 4.6349e+04, -8.8694e+04,  0.0000e+00,  0.0000e+00],
        [-3.7612e+04,  7.1684e+04,  2.0239e+05,  0.0000e+00]])
(Pdb) B
tensor([[ 6.0198e-01,  8.7188e-01,  5.4234e-01,  6.0800e-01],
        [-4.9971e+03,  9.3324e-01,  8.1482e-01,  8.3517e-01],
        [-1.4002e+04,  2.6706e+04,  2.6412e-01,  4.7804e-01],
        [ 1.1382e+04, -2.1603e+04, -6.0834e+04,  1.2768e-01]],
       requires_grad=True)

I can't figure out what am I doing wrong. Any ideas?

1 Answer 1

1

The problem is that the value of sigma wasn't getting computed in each iteration. Basically, the computation code needs to be moved in a function and it needs to be computed in every iteration.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.