I'm trying to write a neural Network for binary classification in PyTorch and I'm confused about the loss function.
I see that BCELoss is a common function specifically geared for binary classification. I also see that an output layer of N outputs for N possible classes is standard for general classification. However, for binary classification it seems like it could be either 1 or 2 outputs.
So, should I have 2 outputs (1 for each label) and then convert my 0/1 training labels into [1,0] and [0,1] arrays, or use something like a sigmoid for a single-variable output?
Here are the relevant snippets of code so you can see:
self.outputs = nn.Linear(NETWORK_WIDTH, 2) # 1 or 2 dimensions?
def forward(self, x):
# other layers omitted
x = self.outputs(x)
return F.log_softmax(x) # <<< softmax over multiple vars, sigmoid over one, or other?
criterion = nn.BCELoss() # <<< Is this the right function?
net_out = net(data)
loss = criterion(net_out, target) # <<< Should target be an integer label or 1-hot vector?
Thanks in advance.