How can I just calculate a part of grad using make_functional in PyTorch?

Question

func_model, func_params = make_functional(self.model)
def fm(x, func_params):
                    fx = func_model(func_params, x)
                    return fx.squeeze(0).squeeze(0)
def floss(func_params,input):
                    fx = fm(input, func_params)
                    return fx
                
per_sample_grads =vmap(jacrev(floss), (None, 0))(func_params, input)
cnt=0
for g in per_sample_grads: 
                g = g.detach()
                J_d = g.reshape(len(g),-1) if cnt == 0 else torch.hstack([J_d,g.reshape(len(g),-1)])
                cnt = 1
                
result = J_d.detach()

In this code, per_sample_grads includes all the grad of a network model by inputting some data. Where the model

self.model = Network(self.input_size, self.hidden_size,  self.output_size, self.depth, act=torch.nn.Tanh() )

I use this code to compute the Jacobian matrix. [ Jacobian matrix : data number * parameter number ]However, what I want to get is a certain small part of grad.

params = torch.cat([p.view(-1) for p in self.model.parameters()], dim=0)
selected_columns = torch.random.choice(p_number, opt_num, replace=False)
target_params=params[selected_columns]# I just need to calculate the grads of the target_params

And getting the all the gradient is some kind of waste in my codes. I want to save some time in this auto_grad step. What can I do to achieve that?

How can I compute the gradients for only part of the model to save time?

Please provide enough code so others can better understand or reproduce the problem. — Community
– Community Bot, Commented Sep 17, 2024 at 13:15

Valérian Rey · Accepted Answer · 2025-07-21 12:22:13Z

1

With torch, you have control over the nn.Parameters with respect to which you differentiate, but not over the specific indices in those nn.Parameters. For that reason, you're forced to compute the Jacobian with respect to all parameters (or at least with all those that contain at least one index that was selected). Technically, you could compute this Jacobian layer after layer, select the columns that you want to keep and drop the others to free memory, which should substantially increase performance in a memory-limited environment. This is quite complex to implement though, so I wouldn't necessarily go in that direction.

So, in your specific case, computing the Jacobian with respect to all model parameters and then selecting randomly the columns you're interested in (i.e. what you already do) should be the fastest approach that is also easy to implement.

If you don't need the indices to be selected randomly, you could, however, differentiate with respect to the parameters in the last few layers of the model only. This is quite easy to implement and will make the differentiation faster.

answered Jul 21 at 12:22

Valérian Rey

534 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Klae zhou Oct 22 at 17:16

Thanks a lot for your answer. Yeah, I think selecting the columns is the only way to implement it.

Collectives™ on Stack Overflow

How can I just calculate a part of grad using make_functional in PyTorch?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related