1
func_model, func_params = make_functional(self.model)
def fm(x, func_params):
                    fx = func_model(func_params, x)
                    return fx.squeeze(0).squeeze(0)
def floss(func_params,input):
                    fx = fm(input, func_params)
                    return fx
                
per_sample_grads =vmap(jacrev(floss), (None, 0))(func_params, input)
cnt=0
for g in per_sample_grads: 
                g = g.detach()
                J_d = g.reshape(len(g),-1) if cnt == 0 else torch.hstack([J_d,g.reshape(len(g),-1)])
                cnt = 1
                
result = J_d.detach()

In this code, per_sample_grads includes all the grad of a network model by inputting some data. Where the model

self.model = Network(self.input_size, self.hidden_size,  self.output_size, self.depth, act=torch.nn.Tanh() )

I use this code to compute the Jacobian matrix. [ Jacobian matrix : data number * parameter number ]However, what I want to get is a certain small part of grad.

params = torch.cat([p.view(-1) for p in self.model.parameters()], dim=0)
selected_columns = torch.random.choice(p_number, opt_num, replace=False)
target_params=params[selected_columns]# I just need to calculate the grads of the target_params

And getting the all the gradient is some kind of waste in my codes. I want to save some time in this auto_grad step. What can I do to achieve that?

How can I compute the gradients for only part of the model to save time?

1
  • Please provide enough code so others can better understand or reproduce the problem. Commented Sep 17, 2024 at 13:15

1 Answer 1

1

With torch, you have control over the nn.Parameters with respect to which you differentiate, but not over the specific indices in those nn.Parameters. For that reason, you're forced to compute the Jacobian with respect to all parameters (or at least with all those that contain at least one index that was selected). Technically, you could compute this Jacobian layer after layer, select the columns that you want to keep and drop the others to free memory, which should substantially increase performance in a memory-limited environment. This is quite complex to implement though, so I wouldn't necessarily go in that direction.

So, in your specific case, computing the Jacobian with respect to all model parameters and then selecting randomly the columns you're interested in (i.e. what you already do) should be the fastest approach that is also easy to implement.

If you don't need the indices to be selected randomly, you could, however, differentiate with respect to the parameters in the last few layers of the model only. This is quite easy to implement and will make the differentiation faster.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot for your answer. Yeah, I think selecting the columns is the only way to implement it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.