Questions tagged [objective-functions]
For questions related to the concept of loss (or cost) function in the context of machine learning.
261 questions
0
votes
2
answers
79
views
What loss function to choose that will assign a higher penalty to false negatives than to false positives for regression task?
I am using a machine learning model to remove interference from range-doppler maps to detect targets. I am using a supervised approach, in which I give as input the range-doppler map of target+...
3
votes
1
answer
107
views
Universally better activation /loss function or specific-case dependency?
With the popularity of AIs from every media source this year,im interested in learning more about them and maybe one day build a good one.I have this code in python:
...
2
votes
1
answer
95
views
Are human content moderators needed anymore with AI?
Are human content moderators for social media websites needed anymore with AI?
In other words, is AI so good now that it can detect if an image is pornographic, obscene, or in any violating social ...
0
votes
0
answers
85
views
how to use contrastive loss function for multi label classification?
I have a multi label classification problem, where I was initially using a binary cross entropy loss and my labels are one hot encoded. I found a paper similar to my application and have used ...
1
vote
1
answer
55
views
How can gradient descent optimize a loss surface that's never fully computed?
In gradient descent for neural networks, we optimize over a loss surface defined by our loss function L(W) where W represents the network weights. However, since there are infinitely many possible ...
0
votes
0
answers
45
views
How to write a custom loss for multi-label video classification?
I am trying to train a multi-label video classification model. My dataset consists of just one video, sampled at 1fps. I have a total of 12k frames and 21 classes, and in a single frame multiple ...
1
vote
1
answer
107
views
Loss function that penalizes errors more at low values
I am training Deep Learning models to predict the Remaining Useful Life (RUL) of certain devices. The RUL is an estimate of the time remaining until the device is expected to fail. Accurate ...
0
votes
0
answers
101
views
sudden NaN in the loss function of training a GAN for inpainting(AOT-GAN) I am sure there is no Nan in the input
I am now trying to train a GAN called AOT-GAN to do some inpainting operation on some anodized aluminium surfaces. At the beginning, I used a canon camera to take the photos for training the AOT-GAN....
1
vote
1
answer
158
views
Are these objective and loss functions from Actor-Critic Methods correct?
I'm doing a research about actor-critic methods and I want to make sure that I understand these methods right.
First of all, I understand that as it's a combination of value-based and policy-based ...
1
vote
1
answer
102
views
Expected return formula for deterministic policy
I have a question regarding how the expected return of a deterministic policy in written. I have seen that in some cases the use the Q-Function as it is shown in the part Objective function ...
0
votes
0
answers
93
views
Loss function on intermediate layers of the networks
Typically in supervised learning, a neural networks' output is compared to the targets through a loss function, and the gradients are backpropagated. Is it a bad idea to also have a loss function on ...
2
votes
1
answer
65
views
Do we plug in the old values or the new values during the gradient descent update?
I have a scenario when I am trying to optimize a vector of D dimensions. Every component of the vector is dependent on other components according to a function such as: summation over (i,j): (1-e(x_i)(...
2
votes
1
answer
152
views
Custom Loss Function Traps Network in Local Optima
I am working with a feedforward neural network to fit the following simple function:
N(1) = -1
N(2) = -1
N(3) = 1
N(4) = -1
But I don't want to use the Mean-...
0
votes
1
answer
145
views
Using conditional probability as an estimate in a loss function
I have a rather large ML framework that takes multiple conditional probability terms that are computed via classifiers/neural networks. This arbitrary loss function is computed via a function:
...
2
votes
0
answers
94
views
Can local learning rules minimize a global loss?
It is widely believed that synaptic plasticity is the way biological brains learn. Artificial implementations of this mechanism are for instance local weight-update rules in Spiking Neural Networks. ...
0
votes
1
answer
52
views
Sparse Cross Entropy
I've been attempting to mess around with Sparse Categorical Cross Entropy Loss for the MNIST dataset. I can't seem to figure out what might be wrong with my implementation, the loss seems to ...
0
votes
0
answers
64
views
Optimizing a nonlinear objective function in Deep Reinforcement Learning
I'm working on a reinforcement learning problem where the environment returns a reward pair $(r_{t+1}^{(a)}, r_{t+1}^{(b)})$. The goal is to maximize the following nonlinear objective function.
$$
E[\...
1
vote
1
answer
112
views
Multi-task objective sometimes improve single-task performance, but is this true when fine tuning?
It is known that multitask objectives in neural networks sometimes have the effect of improving the performance of the neural network for each of the tasks individually (versus training the same ...
2
votes
2
answers
97
views
Why does an action cost function dependes on result state in search problems?
In the famous AI book Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig (4th edition), in chapter 3, the action cost function of a problem solver agent denoted as $c(s, a, ...
2
votes
1
answer
110
views
Can you explain the Hinton's comment "Rprop is equivalent to using the gradient, but also dividing by the size of the gradient"?
Been reviewing some old foundational material and ran into this comment by Hinton on Rprop in his old Coursera class:
Rprop is equivalent to using the gradient, but also dividing by the
size of the ...
0
votes
0
answers
63
views
Non differentiable loss function train with actor critic style
I'm working on a project where a non differentiable loss is there. I'm thinking about how should I deal with them.
My model is a very big lstm model (about 1M parameter), and after 500 steps (not sure ...
1
vote
0
answers
104
views
How do LGBM rankers train?
I'm looking into Learning to Rank models - specifically, the LGBMRanker model - and I want to understand how it's able to train. It takes in features, group sizes and labels, and optimizes for a ...
1
vote
0
answers
77
views
Search recall optimization - what appropriate loss function to use?
I am studying machine learning and wanted to work on a project of my own so that I have better chances after graduating college. I'm studying the application of ML to improve searches using a toy ...
1
vote
1
answer
123
views
why learn an observation model when training latent space model in model based rl
I'm currently studying reinforcement learning through CS 285 provided by UC Berkeley.
At 1:52 of the part 5 of the lecture 11, I got confused on why one would want to learn an observation model $p(o_t ...
2
votes
2
answers
150
views
In logistic regression, do I try to fit the graph perfectly or mimimize the error in the predicted probabilities?
In linear regression, I train the model so the graph runs best through the data points, so the geometric distance between f(x) and $y^i$ is minimized.
Now, is it correct that in logistic regression I ...
1
vote
0
answers
75
views
Can gradient descent cause loss to increase in some situations?
Is a gradient descent step always supposed to decrease loss? I can think of a situation where it would seem that gradient descent would increase loss but maybe it I am misunderstanding a part of ...
1
vote
2
answers
91
views
How do I assign a weight to an additional loss?
I am trying to do multi-spectral image fusion. I am using the following paper as a reference.
https://arxiv.org/pdf/1804.08361.pdf
The code available on GitHub works well. But, I am trying to add some ...
1
vote
0
answers
1k
views
What is MLM & NSP loss function
Two objective functions are used during the BERT language
model pretraining step.
The first one is masked language
model (MLM) that randomly masks
15% of the
input tokens and the objective is to ...
4
votes
1
answer
2k
views
What is the best way to combine or weight multiple losses with gradient descent?
I am optimizing a neural network with Adam using 3 different losses. Their scale is very different, and the current method is to either sum the losses and clip the gradient or to manually weight them ...
0
votes
1
answer
60
views
Which loss / activation function with 2 classes that do not occur often and do not sum to one?
I have a neural network that predicts 2 classes of a time series (bottom and top). Currenlty my Y labels are size 2: [1 0] for bottom and [0 1] for top. The NN has 2 output nodes.
Of course not every ...
0
votes
1
answer
494
views
What is the correct loss function for binary classification: Cross entropy or Binary cross entropy?
Let's say I have a binary classification problem and I want to solve it by means of FC neural net. So which approach will be correct: 1) define the last layer of NN like this ...
0
votes
1
answer
5k
views
What's the difference between classification and segmentation in deep learning?
What's the difference between classification and segmentation in deep learning?
In particular, can the classification loss function be used for segmentation problems?
2
votes
1
answer
180
views
Image classification problem with multiple right classes
I have a use case where the model needs to detect fabricdefects. There are 15+ different kinds of defects. In one image there can be multiple defects present. The straight forward solution for this ...
1
vote
1
answer
827
views
Why MSE and MAE yield poor results when used with gradient-based optimization for classification?
Deep learning book chapter 6: In 6.2.1.2 last paragraph:
Unfortunately, mean squared error and mean absolute error often lead to poor results when used with gradient-based optimization. Some output ...
0
votes
1
answer
129
views
Why is `SigmoidBinaryCrossEntropyLoss` in `DJL` implemented this way?
SigmoidBinaryCrossEntropyLoss implementation in DJL accepts two kinds of outputs from NNs:
where sigmoid activation has already been applied.
where raw NN output ...
1
vote
0
answers
69
views
Loss Function for Binary Classification with Multiple Correct Choices
I have a binary classification problem, where there are multiple correct predictions, however, I would consider the prediction to be correct if the highest confidence prediction of a 1 is correct.
I ...
0
votes
1
answer
90
views
Learning curve converges with huge errors
I am training an auto-encoder over $10^4$ epochs. I get a converging learning curve. However the error at the last stages stays huge $\sim10^{15}$. What does this mean? does it mean that my auto-...
1
vote
0
answers
145
views
Training a neural network simultaneously with two different loss functions rather than considering the weighted sum
This is a follow up on the already asked question: Is the neural network 100% accurate on training data if epoch loss is minimized to 0?
I want to train a neural network that works as an approximator ...
1
vote
0
answers
448
views
Left-to-Right vs Encoder-decoder Models
Xu et al. (2022) distinguishes between popular pre-training methods for language modeling: (see Section 2.1 PRETRAINING METHODS)
Left-to-Right:
Auto-regressive, Left-to-right models, predict the ...
1
vote
1
answer
109
views
Do we need to know or verify properties of loss functions / metrics' implementations?
I will start with an example, in order to get to the general question.
I was reading the following paper (https://www.cns.nyu.edu/pub/lcv/wang03-preprint.pdf) about Structural Similarity Index (SSIM), ...
1
vote
1
answer
271
views
Is the discriminator of a GAN network embedded in VAE?
From what I understand, a Generative Adversarial Network (GAN) is composed of an encoder (generator), some synthetic data (fake data) and a discriminator that will penalize any distinguishable real ...
3
votes
1
answer
236
views
What loss function should I use if I only care about the accuracy of one class?
CrossEntropyLoss optimizes the overall classification accuracy as
$$ {n_{\text{correct}} \over N} $$
What loss function should I use if I only care about increasing the true positive rate of one class?...
0
votes
2
answers
123
views
How to define a loss function for multi-label problem?
I have voice recordings which are labelled by not only a single label but multiple labels. Each voice recording corresponds to one of class labels within a set. In other words, the training instance ...
10
votes
1
answer
10k
views
What is the difference between the triplet loss and the contrastive loss?
What is the difference between the triplet loss and the contrastive loss?
They look same to me. I don't understand the nuances between the two. I have the following queries:
When to use what?
What ...
1
vote
2
answers
874
views
What should I think about when designing a custom loss function?
I'm trying to get my toy network to learn a sine wave.
I output (via tanh) a number between -1 and 1, and I want the network to minimise the following loss, where ...
1
vote
2
answers
583
views
What is the domain of the discriminator of a GAN?
I've read that the discriminator $D$ validates an image $D(x)$, where $x$ is either a real image or a fake one created by the generator, i.e. $ D(G(x))$.
What does the function of the discriminator ...
2
votes
0
answers
51
views
How to create a loss function that penalizes duplicate indices in the output tensor?
We're working on a sequence-to-sequence problem using pytorch, and are using cross-entropy to calculate the loss when comparing the output sequence to the target sequence. This works fine and ...
3
votes
1
answer
334
views
Why do we use "true labels" that are based on the output of our network in Deep Q-Learning?
In the original DQN paper, the $\ell_2$ loss is taken over the distance between our network output, $\hat{q}(s_j,a_j,w)$ and the labels $y_j=r_j+\gamma \cdot \max\limits_{a'} \hat{q}(s_{j+1},a',w^-)$, ...
1
vote
0
answers
66
views
Learning values in open ball: which final layers to employ?
I'm fairly new to deep learning and looking for some reference literature... Specifically, I want to train a neural network to predict vectors $v \in \mathbb{R}^3$ under the constraint $||v||\leq 1$.
...
0
votes
1
answer
137
views
How is catastrophic cancellation dealt with in loss functions?
It just occurred to me that this seems like it should be a very common problem that must have some kind of solution... Yet I'm not sure what it is...
If there is no solution, does this mean once a ...