22

I am trying to adapt this MNIST example to binary classification.

But when changing my NLABELS from NLABELS=2 to NLABELS=1, the loss function always returns 0 (and accuracy 1).

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

# Import data
mnist = input_data.read_data_sets('data', one_hot=True)
NLABELS = 2

sess = tf.InteractiveSession()

# Create the model
x = tf.placeholder(tf.float32, [None, 784], name='x-input')
W = tf.Variable(tf.zeros([784, NLABELS]), name='weights')
b = tf.Variable(tf.zeros([NLABELS], name='bias'))

y = tf.nn.softmax(tf.matmul(x, W) + b)

# Add summary ops to collect data
_ = tf.histogram_summary('weights', W)
_ = tf.histogram_summary('biases', b)
_ = tf.histogram_summary('y', y)

# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, NLABELS], name='y-input')

# More name scopes will clean up the graph representation
with tf.name_scope('cross_entropy'):
    cross_entropy = -tf.reduce_mean(y_ * tf.log(y))
    _ = tf.scalar_summary('cross entropy', cross_entropy)
with tf.name_scope('train'):
    train_step = tf.train.GradientDescentOptimizer(10.).minimize(cross_entropy)

with tf.name_scope('test'):
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    _ = tf.scalar_summary('accuracy', accuracy)

# Merge all the summaries and write them out to /tmp/mnist_logs
merged = tf.merge_all_summaries()
writer = tf.train.SummaryWriter('logs', sess.graph_def)
tf.initialize_all_variables().run()

# Train the model, and feed in test data and record summaries every 10 steps

for i in range(1000):
    if i % 10 == 0:  # Record summary data and the accuracy
        labels = mnist.test.labels[:, 0:NLABELS]
        feed = {x: mnist.test.images, y_: labels}

        result = sess.run([merged, accuracy, cross_entropy], feed_dict=feed)
        summary_str = result[0]
        acc = result[1]
        loss = result[2]
        writer.add_summary(summary_str, i)
        print('Accuracy at step %s: %s - loss: %f' % (i, acc, loss)) 
   else:
        batch_xs, batch_ys = mnist.train.next_batch(100)
        batch_ys = batch_ys[:, 0:NLABELS]
        feed = {x: batch_xs, y_: batch_ys}
    sess.run(train_step, feed_dict=feed)

I have checked the dimensions of both batch_ys (fed into y) and _y and they are both 1xN matrices when NLABELS=1 so the problem seems to be prior to that. Maybe something to do with the matrix multiplication?

I actually have got this same problem in a real project, so any help would be appreciated... Thanks!

2
  • I am trying your netwrok and it dosen't seem to be working, have you found a possible solution ? Commented Oct 24, 2017 at 2:38
  • I see that you're initializing all of your weights to 0. I think the model's not going to work because everything gets multiplied by zero. I would change that to W=tf.get_variable('weights'[in_dim,out_dim],initializer=tf.truncated_normal_initializer()) Commented Aug 4, 2019 at 2:42

2 Answers 2

42

The original MNIST example uses a one-hot encoding to represent the labels in the data: this means that if there are NLABELS = 10 classes (as in MNIST), the target output is [1 0 0 0 0 0 0 0 0 0] for class 0, [0 1 0 0 0 0 0 0 0 0] for class 1, etc. The tf.nn.softmax() operator converts the logits computed by tf.matmul(x, W) + b into a probability distribution across the different output classes, which is then compared to the fed-in value for y_.

If NLABELS = 1, this acts as if there were only a single class, and the tf.nn.softmax() op would compute a probability of 1.0 for that class, leading to a cross-entropy of 0.0, since tf.log(1.0) is 0.0 for all of the examples.

There are (at least) two approaches you could try for binary classification:

  1. The simplest would be to set NLABELS = 2 for the two possible classes, and encode your training data as [1 0] for label 0 and [0 1] for label 1. This answer has a suggestion for how to do that.

  2. You could keep the labels as integers 0 and 1 and use tf.nn.sparse_softmax_cross_entropy_with_logits(), as suggested in this answer.

Sign up to request clarification or add additional context in comments.

5 Comments

Oh, I see the problem... But now I wonder, should I use cross entropy at all for a binary classification problem then? This guy who has implemented a neural network by hand has not used cross entropy for his binary classification problem: iamtrask.github.io/2015/07/12/basic-python-network
That's certainly possible. You could have a single output unit, feed it through tf.nn.sigmoid() to get a value between 0 and 1 and then use y - y_ as the loss function.
doesn't TensorFlow try to minimize the loss function? It has to be a scalar, right? Should I do something like sum(abs(y-y_))?
You could try the tf.nn.l2_loss() op instead of sum(abs(…)).
Usually the logarithmic loss would be a good choice used in combination with a single output unit. Logarithmic loss is also called binary cross entropy because it is a special case of cross entropy working on only two classes (check exegetic.biz/blog/2015/12/making-sense-logarithmic-loss for a more detailed explanation). In Keras you can use binary_crossentropy. In TensorFlow you can use log_loss.
0

I've been looking for good examples of how to implement binary classification in TensorFlow in a similar manner to the way it would be done in Keras. I didn't find any, but after digging through the code a bit, I think I have it figured out. I modified the problem here to implement a solution that uses sigmoid_cross_entropy_with_logits the way Keras does under the hood.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

# Import data
mnist = input_data.read_data_sets('data', one_hot=True)
NLABELS = 1

sess = tf.InteractiveSession()

# Create the model
x = tf.placeholder(tf.float32, [None, 784], name='x-input')
W = tf.get_variable('weights', [784, NLABELS],
                    initializer=tf.truncated_normal_initializer()) * 0.1
b = tf.Variable(tf.zeros([NLABELS], name='bias'))
logits = tf.matmul(x, W) + b

# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, NLABELS], name='y-input')

# More name scopes will clean up the graph representation
with tf.name_scope('cross_entropy'):

    #manual calculation : under the hood math, don't use this it will have gradient problems
    # entropy = tf.multiply(tf.log(tf.sigmoid(logits)), y_) + tf.multiply((1 - y_), tf.log(1 - tf.sigmoid(logits)))
    # loss = -tf.reduce_mean(entropy, name='loss')

    entropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=y_, logits=logits)
    loss = tf.reduce_mean(entropy, name='loss')

with tf.name_scope('train'):
    # Using Adam instead
    # train_step = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)
    train_step = tf.train.AdamOptimizer(learning_rate=0.002).minimize(loss)

with tf.name_scope('test'):
    preds = tf.cast((logits > 0.5), tf.float32)
    correct_prediction = tf.equal(preds, y_)
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

tf.initialize_all_variables().run()

# Train the model, and feed in test data and record summaries every 10 steps

for i in range(2000):
    if i % 100 == 0:  # Record summary data and the accuracy
        labels = mnist.test.labels[:, 0:NLABELS]
        feed = {x: mnist.test.images, y_: labels}
        result = sess.run([loss, accuracy], feed_dict=feed)
        print('Accuracy at step %s: %s - loss: %f' % (i, result[1], result[0]))
    else:
        batch_xs, batch_ys = mnist.train.next_batch(100)
        batch_ys = batch_ys[:, 0:NLABELS]
        feed = {x: batch_xs, y_: batch_ys}
    sess.run(train_step, feed_dict=feed)

Training:

Accuracy at step 0: 0.7373 - loss: 0.758670
Accuracy at step 100: 0.9017 - loss: 0.423321
Accuracy at step 200: 0.9031 - loss: 0.322541
Accuracy at step 300: 0.9085 - loss: 0.255705
Accuracy at step 400: 0.9188 - loss: 0.209892
Accuracy at step 500: 0.9308 - loss: 0.178372
Accuracy at step 600: 0.9453 - loss: 0.155927
Accuracy at step 700: 0.9507 - loss: 0.139031
Accuracy at step 800: 0.9556 - loss: 0.125855
Accuracy at step 900: 0.9607 - loss: 0.115340
Accuracy at step 1000: 0.9633 - loss: 0.106709
Accuracy at step 1100: 0.9667 - loss: 0.099286
Accuracy at step 1200: 0.971 - loss: 0.093048
Accuracy at step 1300: 0.9714 - loss: 0.087915
Accuracy at step 1400: 0.9745 - loss: 0.083300
Accuracy at step 1500: 0.9745 - loss: 0.079019
Accuracy at step 1600: 0.9761 - loss: 0.075164
Accuracy at step 1700: 0.9768 - loss: 0.071803
Accuracy at step 1800: 0.9777 - loss: 0.068825
Accuracy at step 1900: 0.9788 - loss: 0.066270

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.