1,464 questions
3
votes
1
answer
89
views
Freeze, then unfreeze gradients of a trained parameter in PyTorch does not work
Let's say I have a parameter which is a p-shaped vector and I wish to train it in PyTorch such that: for some iterations, only the first k <= p elements of this vector were trained whereas the rest ...
1
vote
0
answers
106
views
Gradient Descent blowing up in linear regression
I am coding a linear regression code in python,I used the formulas I learnt and checked them up, and also tried normalising the the dataset what happened then is the values of weight and bias changed ...
0
votes
1
answer
61
views
How to get gradient with respect to matrix in JAX?
As far as I know, JAX only supports "rank 1" vector-valued function for the jax.jacrev autograd. How do I support higher rank tensors?
I don't want to flatten my matrix, then unflatten it ...
0
votes
0
answers
54
views
How do I tell tensorflow to throw an error if I am trying to do a non-differentiable operation on a variable?
I am learning tensorflow and spent a good amount of time trying to find what is causing this error:
No gradients provided for any variable.
In the end I tracked that it was caused by using argmax at ...
0
votes
0
answers
44
views
Torch gradient estimates disagreeing with analytic and perturbation approximated gradients
I'm faced with a problem where as the title says I'm having trouble with the torch package's built in automatic differentiation algorithms (or my usage?). I think it was meant to be used on mini-...
1
vote
0
answers
28
views
Matlab Reinforcement Learning, Issue with obtaining gradient from Qvalue critic using dlfeval,dlgradient,dlarrays
I'm trying to implement a custom agent, and inside my agent I'm running into issues with obtaining the gradient of the Q value with respect to my actor network parameters. I have my code below, main ...
0
votes
0
answers
31
views
Deterministic minimization of a stochastic function with subgradient method
Problem: I have implemented several step-size strategies (classic, Polyak, and Adagrad), but my subgradient algorithm either diverges or fails to converge.
Initially, I focused on the problem:
Initial ...
2
votes
1
answer
102
views
SIR parameter estimation with gradient descent and autograd
I am trying to apply a very simple parameter estimation of a SIR model using a gradient descent algorithm. I am using the package autograd since the audience (this is for a sort of workshop for ...
0
votes
1
answer
51
views
theta values for gradient descent not coherent
i made a gradient descent code but it doesnt seem to work well
import numpy as np
from random import randint,random
import matplotlib . pyplot as plt
def calculh(theta, X):
h = 0
h+=theta[0]*X ...
0
votes
0
answers
102
views
Gradient descent 3D visualization Python
I've recently implemented a neural network from scratch and am now focusing on visualizing the optimization process. Specifically, I'm interested in creating a 3D visualization of the loss landscape ...
0
votes
0
answers
41
views
How to specify gradient computation path in a neural network in pytorch
I want to implement a neural network on pytorch where gradients are not computed over all the weights. Let's say for example I have an MLP with three layers and I want half of the nodes in the last ...
0
votes
1
answer
19
views
Global minimum as a starting point of Gradient Descent
If I already have the Global Minimum value for the Cost function of any model (including large language models) - would it facilitate Gradient Descent calculation?
(suppose I have a quick way to ...
0
votes
1
answer
125
views
Problem in Backpropagation through a sample in Beta distribution in pytorch
Say I have obtained some alphas and betas as parameters from a neural network, which will be parameters of the Beta distribution. Now, I sample from the Beta distribution and then calculate some loss ...
0
votes
0
answers
35
views
Issues when minimizing cost function in a simple linear regression
I'm quite new to ML and I'm trying to do a linear regression with quite a simple dataset: text
I did two different regression, one by hand and the other one using sci kit learn, where in the latter I ...
0
votes
1
answer
40
views
Linear regression model barely optimizes the intercept b
I've programmed a linear regression model from scratch. I use the "Sum of squared residuals" as the loss function for gradient descent. For testing I use linear data (y=x)
When running the ...
0
votes
0
answers
22
views
Do we plug in the old values or the new values during the gradient descent update?
I have a scenario when I am trying to optimize a vector of D dimensions. Every component of the vector is dependent on other components according to a function such as:
summation over (i,j):
(1-e(x_i)(...
-1
votes
1
answer
75
views
What is wrong with my gradient descent implementation (SVM classifier with hinge loss)
I am trying to implement and train an SVM multi-class classifier from scratch using python and numpy in jupyter notebooks.
I have been using the CS231n course as my base of knowledge, especially this ...
1
vote
1
answer
59
views
How can I just calculate a part of grad using make_functional in PyTorch?
func_model, func_params = make_functional(self.model)
def fm(x, func_params):
fx = func_model(func_params, x)
return fx.squeeze(0).squeeze(0)
def floss(...
0
votes
1
answer
90
views
Simple Gradient Descent in Python vs Keras
I am practicing neural networks by building my own in notebooks. I am trying to check my model against an equivalent model in Keras. My model seems to work the same as other simple coded neural ...
1
vote
1
answer
64
views
PyTorch function involving softmax and log2 second derivative is always 0
I'm trying to compute the second derivatives (Hessian) of a function t with respect to a tensor a using PyTorch. Below is the code I initially wrote:
import torch
torch.manual_seed(0)
a = torch....
0
votes
1
answer
105
views
Can you affine warp a tensor while preserving gradient flow?
I'm trying to recreate the cv2.warpAffine() function, taking a tensor input and output rather than a Numpy array. However, gradients calculated from the output tensor produce a Non-None gradient ...
-1
votes
1
answer
58
views
Learning rate in Gradient Descent algorithm
In the gradient descent algorithm, I update the B and M values according to their derivatives and then multiply them with the Learning rate value, but when I use the same value for L, such as 0.0001,...
1
vote
2
answers
115
views
Gradient descent application
I'm having problems with my gradient descent function.
The scatter plot of my diagram shows a negative correlation but the line of best fit gotten from my gradient descent function shows a positive ...
0
votes
1
answer
46
views
Pytorch, use loss that don't return gradient
I'm trying to develop a model that improves the quality of a given audio. For this task I use DAC for the latent space and I run a transformer model to change the value of the latent space to improve ...
0
votes
0
answers
184
views
Gradient Vanishing when training LSTM with pytorch
I was training a simple LSTM neural network with pytorch to predict stock price. And it is confusing to me that my network wouldn't fit. The loss is exploding and the r2 is negative. As the training ...
1
vote
0
answers
87
views
Calculating variance of gradient of barren plateau problem in quantum variational circuit
In paper Cost function dependent barren plateaus in shallow
parametrized quantum circuits, the author exhibit an warm-up example in page 2 to show the barren plateau phenomenon. In this example, the ...
2
votes
1
answer
126
views
Torch.unique() alternatives that do not break gradient flow?
In a Pytorch gradient descent algorithm, the function
def TShentropy(wf):
unique_elements, counts = wf.unique(return_counts=True)
entrsum = 0
for x in counts:
p = x/len_a #...
1
vote
2
answers
367
views
Minimizing Euclidean Norm with Gradient Descent
I'm trying to find a solution for a system of linear equations using Gradient Descent Method ∥Ax-b∥^2 in Python.
The linear equations are:
x - 2y + 3z = - 1
3x + 2y - 5z = 3
2x - 5y + 2z = 0
The ...
2
votes
1
answer
67
views
Cost Function Increases, Then Stops Growing
I understand the zig-zag nature of the cost function when applying gradient descent, but what bothers me is that the cost started out at a low 300 only to increase to 1600 in the end.
The cost ...
0
votes
1
answer
96
views
Is the given code for gradient descent updating the parameters sequentially or simultaneously?
I'm new to machine learning and I have been learning gradient descent algorithm. I believe this code uses simultaneous update, even though it looks like sequential update. Since the values of partial ...
0
votes
0
answers
25
views
Training RL model with TF over all the output vector
I'm training a deep RL model with TensorFlow, but my model doesn't have a single correct action. The output of the network is a vector [x1, x2], and both are actions that need to be optimized.
def ...
1
vote
0
answers
53
views
MNIST Image Classification Gradient Descent Neural Network not working
I have to files PreProcess.java:
/*
* 4/28/24
* Final
*/
package Final;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io....
0
votes
1
answer
120
views
How can I update the actor network parameters in PPO using the gradient from Flux.jl? Gradients return nothing
To preface, I am a complete Julia newbie... I am trying to implement PPO for the first time and I've been having issues updating the actor (and by extension critic) network parameters using the ...
0
votes
1
answer
40
views
Multivariable Linear Regression with Gradient Descent Error
I am following a tutorial from this youtube video (https://www.youtube.com/watch?v=lCOHri09YmM), but I am getting an error "invalid value encountered in subtract coeff = coeff - der", and ...
1
vote
0
answers
145
views
Clarification about the decorator of the step() method of the stochastic gradient descent class
In the SGD class of pytorch, the step() method has the decorator _use_grad_for_differentiable:
@_use_grad_for_differentiable
def step(self, closure=None):
...
Usually I would expect the no_grad ...
0
votes
1
answer
45
views
How exactly does tensorflow perform mini-batch gradient descent?
I am unable to achieve good results unless I choose a batch size of 1. By good, I mean error decreases significantly through the epochs. When I do a full batch of 30 the results are poor, error ...
2
votes
2
answers
267
views
How to multiply matrices and exclude elements based on masking?
I have the following input matrix
inp_tensor = torch.tensor(
[[0.7860, 0.1115, 0.0000, 0.6524, 0.6057, 0.3725, 0.7980, 0.0000],
[1.0000, 0.1115, 0.0000, 0.6524, 0.6057, 0.3725, 0.0000, ...
1
vote
0
answers
151
views
Pytorch Lightning distributed training: what should I set all_gather sync_grads?
I am using pytorch lightning for distributed training. I am using all_gather to gather all the gradients from the gpus in order to calculate the loss function. I am unsure of what I should set the ...
0
votes
1
answer
241
views
Can a tensor with dtype uint8 be used for a loss function, which will later call '.backward()'?
I attempted to calculate the loss between a tensor with dtype float32 and another with dtype uint8.
Since the loss function performs automatic type promotion, I didn't make a type conversion ...
1
vote
0
answers
26
views
Hidden Layer Descent Function Only Working Sometimes
I'm taking an AI class and we're using hidden layers to write a descent function to predict XOR gates in Python. For this assignment specifically it only needs to have 1 hidden layer of 3 hidden nodes....
1
vote
1
answer
66
views
Gradient Descent: Reduced Feature Set has a longer runtime than the Original Feature set
I've tried to implement a gradient descent algorithm in Python for a machine learning problem. The dataset I'm working with has been preprocessed, and I observed an unexpected behavior in the runtime ...
0
votes
0
answers
87
views
Is there a way of running Projected Gradient Descent (i.e. GD with hard constraints on the objective domain) with Julia's Optim package?
I'm trying to experiment with Projected Gradient Descent on some objective functions constrained by a hypercube. "Projected" here simply means that if the next steps falls outside the ...
0
votes
1
answer
688
views
What is the role of loss functions in gradient boosting?
In gradient boosting different loss functions can be used. For example, in sklearn's GradientBoostingRegressor possible loss functions are: ‘squared_error’, ‘absolute_error’, ‘huber’, and ‘quantile’ ...
1
vote
1
answer
115
views
Best way of finding KKT points for a Sympy polynomial
I'm experimenting with running Gradient Descent (GD) on polynomials of some low degree - say 3 - with n variables, on a domain constrained by a hypercube [-1,1]^n. I want to compare the termination ...
0
votes
2
answers
59
views
In Gradient Descent algorithm, how to induce -2*wx
part of Gradient Descent algorithm
this.updateWeights = function() {
let wx;
let w_deriv = 0;
let b_deriv = 0;
for (let i = 0; i < this.points; i++) {
wx = this.yArr[i] - (this....
1
vote
0
answers
34
views
How to implement gradient op for a custom tensorflow op, for which the it is hard to derive a mathematical closed form formula for gradient?
I am interested in implementing a somewhat complex custom tensorflow operation. Let's say (for the purpose of this question) that the operation is similar to performing convolution with stride=2, ...
-1
votes
1
answer
272
views
Gradient descent weights keep getting larger
To get familiar with Gradient Descent algorithm, I tried to create my own Linear Regression model. It works fine for few data points. But when try to fit it using more data, w0 and w1 are always ...
-1
votes
1
answer
134
views
Problem with gradient descent least squares code [closed]
I'm trying to use gradient descent on a data set. What I have written is
import numpy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv('C:/Users/Teacher/...
0
votes
1
answer
1k
views
How to Implement Full Batch Gradient Descent with Nesterov Momentum in PyTorch?
I'm working on a machine learning project in PyTorch where I need to optimize a model using the full batch gradient descent method. The key requirement is that the optimizer should use all the data ...
1
vote
0
answers
23
views
Gradients not changing in co-ordinate descent for logistic regression
I am trying to implement a co-ordinate descent algorithm for logistic regression. My gradients are not changing, as a result I end up updating a single co-ordinate for each epoch. Here is the code:
...