Loss Function & Its Inputs For Binary Classification PyTorch

Question

I'm trying to write a neural Network for binary classification in PyTorch and I'm confused about the loss function.

I see that BCELoss is a common function specifically geared for binary classification. I also see that an output layer of N outputs for N possible classes is standard for general classification. However, for binary classification it seems like it could be either 1 or 2 outputs.

So, should I have 2 outputs (1 for each label) and then convert my 0/1 training labels into [1,0] and [0,1] arrays, or use something like a sigmoid for a single-variable output?

Here are the relevant snippets of code so you can see:

self.outputs = nn.Linear(NETWORK_WIDTH, 2) # 1 or 2 dimensions?


def forward(self, x):
  # other layers omitted
  x = self.outputs(x)           
  return F.log_softmax(x)  # <<< softmax over multiple vars, sigmoid over one, or other?

criterion = nn.BCELoss() # <<< Is this the right function?

net_out = net(data)
loss = criterion(net_out, target) # <<< Should target be an integer label or 1-hot vector?

Thanks in advance.

MBT · Accepted Answer · 2020-10-29 08:26:44Z

78

For binary outputs you can use 1 output unit, so then:

self.outputs = nn.Linear(NETWORK_WIDTH, 1)

Then you use sigmoid activation to map the values of your output unit to a range between 0 and 1 (of course you need to arrange your training data this way too):

def forward(self, x):
    # other layers omitted
    x = self.outputs(x)           
    return torch.sigmoid(x)

Finally you can use the torch.nn.BCELoss:

criterion = nn.BCELoss()

net_out = net(data)
loss = criterion(net_out, target)

This should work fine for you.

You can also use torch.nn.BCEWithLogitsLoss, this loss function already includes the sigmoid function so you could leave it out in your forward.

If you, want to use 2 output units, this is also possible. But then you need to use torch.nn.CrossEntropyLoss instead of BCELoss. The Softmax activation is already included in this loss function.

Edit: I just want to emphasize that there is a real difference in doing so. Using 2 output units gives you twice as many weights compared to using 1 output unit.. So these two alternatives are not equivalent.

edited Oct 29, 2020 at 8:26

answered Dec 5, 2018 at 9:14

MBT

24.6k23 gold badges96 silver badges113 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

никта Over a year ago

Thanks for great answer! The last question about 1 and 2 output units. When would I want to use one over another?

MBT Over a year ago

@никта Good question, actually I'm not sure if there is a preferred strategy when using these two. I think it is more a matter of taste. The main difference here is not the number of units but the loss function aka activation function softmax vs sigmoid. So you could check for the activation functions which suit your problem better. But in most cases you're probably not noticing a major difference in the capability of the resulting network. As this is a minor change I suggest to test both and check which works best for your problem :)

Deepak · Accepted Answer · 2020-04-24 16:49:24Z

1

Some theoretical add up:

For binary classification (say class 0 & class 1), the network should have only 1 output unit. Its output will be 1 (for class 1 present or class 0 absent) and 0 (for class 1 absent or class 0 present).

For loss calculation, you should first pass it through sigmoid and then through BinaryCrossEntropy (BCE). Sigmoid transforms the output of the network to probability (between 0 and 1) and BCE then maximizes the likelihood of the desired output.

answered Apr 24, 2020 at 16:49

Deepak

991 silver badge4 bronze badges

Collectives™ on Stack Overflow

Loss Function & Its Inputs For Binary Classification PyTorch

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related