Newest 'gradient-descent' Questions

3 votes

1 answer

89 views

Freeze, then unfreeze gradients of a trained parameter in PyTorch does not work

Let's say I have a parameter which is a p-shaped vector and I wish to train it in PyTorch such that: for some iterations, only the first k <= p elements of this vector were trained whereas the rest ...

L2m

21

asked Aug 19 at 14:09

1 vote

0 answers

106 views

Gradient Descent blowing up in linear regression

I am coding a linear regression code in python,I used the formulas I learnt and checked them up, and also tried normalising the the dataset what happened then is the values of weight and bias changed ...

ADITYA KUNDU

19

asked Aug 18 at 17:34

0 votes

1 answer

61 views

How to get gradient with respect to matrix in JAX?

As far as I know, JAX only supports "rank 1" vector-valued function for the jax.jacrev autograd. How do I support higher rank tensors? I don't want to flatten my matrix, then unflatten it ...

Mingruifu Lin

161

asked Jul 29 at 17:04

0 votes

0 answers

54 views

How do I tell tensorflow to throw an error if I am trying to do a non-differentiable operation on a variable?

I am learning tensorflow and spent a good amount of time trying to find what is causing this error: No gradients provided for any variable. In the end I tracked that it was caused by using argmax at ...

Tomáš Zato

54k

asked May 26 at 10:24

0 votes

0 answers

44 views

Torch gradient estimates disagreeing with analytic and perturbation approximated gradients

I'm faced with a problem where as the title says I'm having trouble with the torch package's built in automatic differentiation algorithms (or my usage?). I think it was meant to be used on mini-...

Nomi Mino

1

asked Apr 19 at 14:12

1 vote

0 answers

28 views

Matlab Reinforcement Learning, Issue with obtaining gradient from Qvalue critic using dlfeval,dlgradient,dlarrays

I'm trying to implement a custom agent, and inside my agent I'm running into issues with obtaining the gradient of the Q value with respect to my actor network parameters. I have my code below, main ...

Sliferslacker

53

asked Apr 7 at 14:12

0 votes

0 answers

31 views

Deterministic minimization of a stochastic function with subgradient method

Problem: I have implemented several step-size strategies (classic, Polyak, and Adagrad), but my subgradient algorithm either diverges or fails to converge. Initially, I focused on the problem: Initial ...

Titouan Brochard

1

asked Mar 26 at 13:24

2 votes

1 answer

102 views

SIR parameter estimation with gradient descent and autograd

I am trying to apply a very simple parameter estimation of a SIR model using a gradient descent algorithm. I am using the package autograd since the audience (this is for a sort of workshop for ...

Alonso Ogueda Oliva

323

asked Mar 5 at 1:02

0 votes

1 answer

51 views

theta values for gradient descent not coherent

i made a gradient descent code but it doesnt seem to work well import numpy as np from random import randint,random import matplotlib . pyplot as plt def calculh(theta, X): h = 0 h+=theta[0]*X ...

ismail rachid

17

asked Feb 14 at 16:41

0 votes

0 answers

102 views

Gradient descent 3D visualization Python

I've recently implemented a neural network from scratch and am now focusing on visualizing the optimization process. Specifically, I'm interested in creating a 3D visualization of the loss landscape ...

Kris

59

asked Feb 2 at 22:49

0 votes

0 answers

41 views

How to specify gradient computation path in a neural network in pytorch

I want to implement a neural network on pytorch where gradients are not computed over all the weights. Let's say for example I have an MLP with three layers and I want half of the nodes in the last ...

danix

155

asked Jan 31 at 6:19

0 votes

1 answer

19 views

Global minimum as a starting point of Gradient Descent

If I already have the Global Minimum value for the Cost function of any model (including large language models) - would it facilitate Gradient Descent calculation? (suppose I have a quick way to ...

Drout

355

asked Jan 19 at 23:26

0 votes

1 answer

125 views

Problem in Backpropagation through a sample in Beta distribution in pytorch

Say I have obtained some alphas and betas as parameters from a neural network, which will be parameters of the Beta distribution. Now, I sample from the Beta distribution and then calculate some loss ...

Jimut123

536

asked Jan 19 at 16:52

0 votes

0 answers

35 views

Issues when minimizing cost function in a simple linear regression

I'm quite new to ML and I'm trying to do a linear regression with quite a simple dataset: text I did two different regression, one by hand and the other one using sci kit learn, where in the latter I ...

MIKEL LASS

21

asked Dec 28, 2024 at 12:36

0 votes

1 answer

40 views

Linear regression model barely optimizes the intercept b

I've programmed a linear regression model from scratch. I use the "Sum of squared residuals" as the loss function for gradient descent. For testing I use linear data (y=x) When running the ...

Blacklight

1

asked Dec 27, 2024 at 19:40

0 votes

0 answers

22 views

Do we plug in the old values or the new values during the gradient descent update?

I have a scenario when I am trying to optimize a vector of D dimensions. Every component of the vector is dependent on other components according to a function such as: summation over (i,j): (1-e(x_i)(...

Darkmoon Chief

140

asked Nov 5, 2024 at 9:35

-1 votes

1 answer

75 views

What is wrong with my gradient descent implementation (SVM classifier with hinge loss)

I am trying to implement and train an SVM multi-class classifier from scratch using python and numpy in jupyter notebooks. I have been using the CS231n course as my base of knowledge, especially this ...

ho88it

21

asked Oct 4, 2024 at 19:19

1 vote

1 answer

59 views

How can I just calculate a part of grad using make_functional in PyTorch?

func_model, func_params = make_functional(self.model) def fm(x, func_params): fx = func_model(func_params, x) return fx.squeeze(0).squeeze(0) def floss(...

Klae zhou

11

asked Sep 17, 2024 at 12:57

0 votes

1 answer

90 views

Simple Gradient Descent in Python vs Keras

I am practicing neural networks by building my own in notebooks. I am trying to check my model against an equivalent model in Keras. My model seems to work the same as other simple coded neural ...

AdamS

11

asked Sep 14, 2024 at 2:57

1 vote

1 answer

64 views

PyTorch function involving softmax and log2 second derivative is always 0

I'm trying to compute the second derivatives (Hessian) of a function t with respect to a tensor a using PyTorch. Below is the code I initially wrote: import torch torch.manual_seed(0) a = torch....

Ray Bern

135

asked Sep 4, 2024 at 21:30

0 votes

1 answer

105 views

Can you affine warp a tensor while preserving gradient flow?

I'm trying to recreate the cv2.warpAffine() function, taking a tensor input and output rather than a Numpy array. However, gradients calculated from the output tensor produce a Non-None gradient ...

arcanespud

45

asked Aug 22, 2024 at 22:54

-1 votes

1 answer

58 views

Learning rate in Gradient Descent algorithm

In the gradient descent algorithm, I update the B and M values according to their derivatives and then multiply them with the Learning rate value, but when I use the same value for L, such as 0.0001,...

Fhurky

7

asked Aug 7, 2024 at 16:59

1 vote

2 answers

115 views

Gradient descent application

I'm having problems with my gradient descent function. The scatter plot of my diagram shows a negative correlation but the line of best fit gotten from my gradient descent function shows a positive ...

Dubem Nwokike

21

asked Jul 28, 2024 at 15:15

0 votes

1 answer

46 views

Pytorch, use loss that don't return gradient

I'm trying to develop a model that improves the quality of a given audio. For this task I use DAC for the latent space and I run a transformer model to change the value of the latent space to improve ...

Jourdelune

149

asked Jul 5, 2024 at 17:05

0 votes

0 answers

184 views

Gradient Vanishing when training LSTM with pytorch

I was training a simple LSTM neural network with pytorch to predict stock price. And it is confusing to me that my network wouldn't fit. The loss is exploding and the r2 is negative. As the training ...

王一诺

1

asked Jun 30, 2024 at 7:15

1 vote

0 answers

87 views

Calculating variance of gradient of barren plateau problem in quantum variational circuit

In paper Cost function dependent barren plateaus in shallow parametrized quantum circuits, the author exhibit an warm-up example in page 2 to show the barren plateau phenomenon. In this example, the ...

lang xian

11

asked Jun 16, 2024 at 4:26

2 votes

1 answer

126 views

Torch.unique() alternatives that do not break gradient flow?

In a Pytorch gradient descent algorithm, the function def TShentropy(wf): unique_elements, counts = wf.unique(return_counts=True) entrsum = 0 for x in counts: p = x/len_a #...

2 False

21

asked Jun 13, 2024 at 4:48

1 vote

2 answers

367 views

Minimizing Euclidean Norm with Gradient Descent

I'm trying to find a solution for a system of linear equations using Gradient Descent Method ∥Ax-b∥^2 in Python. The linear equations are: x - 2y + 3z = - 1 3x + 2y - 5z = 3 2x - 5y + 2z = 0 The ...

Orhan94

11

asked Jun 10, 2024 at 22:08

2 votes

1 answer

67 views

Cost Function Increases, Then Stops Growing

I understand the zig-zag nature of the cost function when applying gradient descent, but what bothers me is that the cost started out at a low 300 only to increase to 1600 in the end. The cost ...

Topics on Data

23

asked Jun 7, 2024 at 22:10

0 votes

1 answer

96 views

Is the given code for gradient descent updating the parameters sequentially or simultaneously?

I'm new to machine learning and I have been learning gradient descent algorithm. I believe this code uses simultaneous update, even though it looks like sequential update. Since the values of partial ...

Mayank Gupta

11

asked Jun 5, 2024 at 15:28

0 votes

0 answers

25 views

Training RL model with TF over all the output vector

I'm training a deep RL model with TensorFlow, but my model doesn't have a single correct action. The output of the network is a vector [x1, x2], and both are actions that need to be optimized. def ...

gustavo lobos astorquiza

1

asked May 27, 2024 at 2:48

1 vote

0 answers

53 views

MNIST Image Classification Gradient Descent Neural Network not working

I have to files PreProcess.java: /* * 4/28/24 * Final */ package Final; import java.io.DataInputStream; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io....

Mark Agib

11

asked May 17, 2024 at 7:18

0 votes

1 answer

120 views

How can I update the actor network parameters in PPO using the gradient from Flux.jl? Gradients return nothing

To preface, I am a complete Julia newbie... I am trying to implement PPO for the first time and I've been having issues updating the actor (and by extension critic) network parameters using the ...

Max Kim

1

asked May 13, 2024 at 19:54

0 votes

1 answer

40 views

Multivariable Linear Regression with Gradient Descent Error

I am following a tutorial from this youtube video (https://www.youtube.com/watch?v=lCOHri09YmM), but I am getting an error "invalid value encountered in subtract coeff = coeff - der", and ...

butters149

1

asked May 13, 2024 at 17:15

1 vote

0 answers

145 views

Clarification about the decorator of the step() method of the stochastic gradient descent class

In the SGD class of pytorch, the step() method has the decorator _use_grad_for_differentiable: @_use_grad_for_differentiable def step(self, closure=None): ... Usually I would expect the no_grad ...

soap

771

asked May 6, 2024 at 11:41

0 votes

1 answer

45 views

How exactly does tensorflow perform mini-batch gradient descent?

I am unable to achieve good results unless I choose a batch size of 1. By good, I mean error decreases significantly through the epochs. When I do a full batch of 30 the results are poor, error ...

debo

380

asked May 2, 2024 at 6:28

2 votes

2 answers

267 views

How to multiply matrices and exclude elements based on masking?

I have the following input matrix inp_tensor = torch.tensor( [[0.7860, 0.1115, 0.0000, 0.6524, 0.6057, 0.3725, 0.7980, 0.0000], [1.0000, 0.1115, 0.0000, 0.6524, 0.6057, 0.3725, 0.0000, ...

Penguin

2,651

asked Apr 29, 2024 at 19:07

1 vote

0 answers

151 views

Pytorch Lightning distributed training: what should I set all_gather sync_grads?

I am using pytorch lightning for distributed training. I am using all_gather to gather all the gradients from the gpus in order to calculate the loss function. I am unsure of what I should set the ...

JobHunter69

2,376

asked Apr 26, 2024 at 18:18

0 votes

1 answer

241 views

Can a tensor with dtype uint8 be used for a loss function, which will later call '.backward()'?

I attempted to calculate the loss between a tensor with dtype float32 and another with dtype uint8. Since the loss function performs automatic type promotion, I didn't make a type conversion ...

Aria Lovelace

1

asked Apr 16, 2024 at 12:36

1 vote

0 answers

26 views

Hidden Layer Descent Function Only Working Sometimes

I'm taking an AI class and we're using hidden layers to write a descent function to predict XOR gates in Python. For this assignment specifically it only needs to have 1 hidden layer of 3 hidden nodes....

Oreo

11

asked Apr 9, 2024 at 4:19

1 vote

1 answer

66 views

Gradient Descent: Reduced Feature Set has a longer runtime than the Original Feature set

I've tried to implement a gradient descent algorithm in Python for a machine learning problem. The dataset I'm working with has been preprocessed, and I observed an unexpected behavior in the runtime ...

H.S

23

asked Apr 7, 2024 at 14:13

0 votes

0 answers

87 views

Is there a way of running Projected Gradient Descent (i.e. GD with hard constraints on the objective domain) with Julia's Optim package?

I'm trying to experiment with Projected Gradient Descent on some objective functions constrained by a hypercube. "Projected" here simply means that if the next steps falls outside the ...

ufghd34

168

asked Apr 4, 2024 at 17:26

0 votes

1 answer

688 views

What is the role of loss functions in gradient boosting?

In gradient boosting different loss functions can be used. For example, in sklearn's GradientBoostingRegressor possible loss functions are: ‘squared_error’, ‘absolute_error’, ‘huber’, and ‘quantile’ ...

Sanyo Mn

441

asked Apr 1, 2024 at 18:03

1 vote

1 answer

115 views

Best way of finding KKT points for a Sympy polynomial

I'm experimenting with running Gradient Descent (GD) on polynomials of some low degree - say 3 - with n variables, on a domain constrained by a hypercube [-1,1]^n. I want to compare the termination ...

ufghd34

168

asked Mar 30, 2024 at 11:39

0 votes

2 answers

59 views

In Gradient Descent algorithm, how to induce -2*wx

part of Gradient Descent algorithm this.updateWeights = function() { let wx; let w_deriv = 0; let b_deriv = 0; for (let i = 0; i < this.points; i++) { wx = this.yArr[i] - (this....

chang dae Kim

1

asked Mar 16, 2024 at 9:41

1 vote

0 answers

34 views

How to implement gradient op for a custom tensorflow op, for which the it is hard to derive a mathematical closed form formula for gradient?

I am interested in implementing a somewhat complex custom tensorflow operation. Let's say (for the purpose of this question) that the operation is similar to performing convolution with stride=2, ...

Aviraj Bevli

21

asked Mar 11, 2024 at 9:50

-1 votes

1 answer

272 views

Gradient descent weights keep getting larger

To get familiar with Gradient Descent algorithm, I tried to create my own Linear Regression model. It works fine for few data points. But when try to fit it using more data, w0 and w1 are always ...

LNTR

84

asked Mar 6, 2024 at 14:22

-1 votes

1 answer

134 views

Problem with gradient descent least squares code [closed]

I'm trying to use gradient descent on a data set. What I have written is import numpy import pandas as pd import numpy as np import matplotlib.pyplot as plt data = pd.read_csv('C:/Users/Teacher/...

user124910

115

asked Mar 6, 2024 at 7:48

0 votes

1 answer

1k views

How to Implement Full Batch Gradient Descent with Nesterov Momentum in PyTorch?

I'm working on a machine learning project in PyTorch where I need to optimize a model using the full batch gradient descent method. The key requirement is that the optimizer should use all the data ...

Maxou

1

asked Mar 4, 2024 at 16:10

1 vote

0 answers

23 views

Gradients not changing in co-ordinate descent for logistic regression

I am trying to implement a co-ordinate descent algorithm for logistic regression. My gradients are not changing, as a result I end up updating a single co-ordinate for each epoch. Here is the code: ...

Necessary_title

11

asked Feb 24, 2024 at 22:00

Collectives™ on Stack Overflow