2

I'm trying to use Numpy arrays within a graph, feeding in the data using a Dataset.

I've read through this, but can't quite make sense of how I should feed placeholder arrays within a Dataset.

If we take a simple example, I start with:

A = np.arange(4)
B = np.arange(10, 14)

a = tf.placeholder(tf.float32, [None])
b = tf.placeholder(tf.float32, [None])
c = tf.add(a, b)

with tf.Session() as sess:
    for i in range(10):
        x = sess.run(c, feed_dict={a: A, b:B})
        print(i, x)

Then I attempt to modify it to use a Dataset as follows:

A = np.arange(4)
B = np.arange(10, 14)

a = tf.placeholder(tf.int32, A.shape)
b = tf.placeholder(tf.int32, B.shape)
c = tf.add(a, b)

dataset = tf.data.Dataset.from_tensors((a, b))

iterator = dataset.make_initializable_iterator()

with tf.Session() as sess3:
    sess3.run(tf.global_variables_initializer())
    sess3.run(iterator.initializer, feed_dict={a: A, b: B})

    for i in range(10):
        x = sess3.run(c)
        print(i, x)

If I run this I get 'InvalidArgumentError: You must feed a value for placeholder tensor ...'

The code until the for loop mimics the example here, but I don't get how I can then employ the placeholders a & b without supplying a feed_dict to every call to sess3.run(c) [which would be expensive]. I suspect I have to somehow use the iterator, but I don't understand how.

Update

It appears I oversimplified too much when picking the example. What I am really trying to do is use Datasets when training a neural network, or similar.

For a more sensible question, how would I go about using Datasets to feed placeholders in the below (though imagine X and Y_true are much longer...). The documentation takes me to the point where the loop starts and then I'm not sure.

X = np.arange(8.).reshape(4, 2)
Y_true = np.array([0, 0, 1, 1])

x = tf.placeholder(tf.float32, [None, 2], name='x')
y_true = tf.placeholder(tf.float32, [None], name='y_true')

w = tf.Variable(np.random.randn(2, 1), name='w', dtype=tf.float32)

y = tf.squeeze(tf.matmul(x, w), name='y')

loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
                                labels=y_true, logits=y),
                                name='x_entropy')

# set optimiser
optimiser = tf.train.AdamOptimizer().minimize(loss)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for i in range(100):
        _, loss_out = sess.run([optimiser, loss], feed_dict={x: X, y_true:Y_true})
        print(i, loss_out)

Trying the following only gets me a InvalidArgumentError

X = np.arange(8.).reshape(4, 2)
Y_true = np.array([0, 0, 1, 1])

x = tf.placeholder(tf.float32, [None, 2], name='x')
y_true = tf.placeholder(tf.float32, [None], name='y_true')

dataset = tf.data.Dataset.from_tensor_slices((x, y_true))
iterator = dataset.make_initializable_iterator()

w = tf.Variable(np.random.randn(2, 1), name='w', dtype=tf.float32)

y = tf.squeeze(tf.matmul(x, w), name='y')

loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
                                labels=y_true, logits=y),
                                name='x_entropy')

# set optimiser
optimiser = tf.train.AdamOptimizer().minimize(loss)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    sess.run(iterator.initializer, feed_dict={x: X,
                                              y_true: Y_true})

    for i in range(100):
        _, loss_out = sess.run([optimiser, loss])
        print(i, loss_out)
2
  • What do you expect the result of sess3.run(c) to be? The dataset only contains a single element, so even if you used iterator.get_next(), the loop would only perform one iteration before signaling that there are no more elements. Commented Dec 13, 2017 at 17:18
  • This is probably not the clearest example - I was trying to show the simplest example I could, but apparently lost the meaning... I'll edit the question Commented Dec 14, 2017 at 8:37

2 Answers 2

4

Use iterator.get_next() to get elements from Dataset like:

next_element = iterator.get_next()

than initialize the iterator

sess.run(iterator.initializer, feed_dict={a:A, b:B})

and at least get the values from Dataset

value = sess.run(next_element)

EDIT:

The code above just return the elements from Dataset. The Dataset API is intended to serve features and labels for a input_fn, therefore all additional computations for preprocessing should be performed within the Dataset API. If you want to add elements, you should define a function that is applied to the elements, like:

def add_fn(exp1, exp2):
  return tf.add(exp1, exp2)

and than you can map these function to your Dataset:

dataset = dataset.map(add_fn)

Complete code example:

A = np.arange(4)
B = np.arange(10, 14)
a = tf.placeholder(tf.int32, A.shape)
b = tf.placeholder(tf.int32, B.shape)
#c = tf.add(a, b)
def add_fn(exp1, exp2):
  return tf.add(exp1, exp2)
dataset = tf.data.Dataset.from_tensors((a, b))
dataset = dataset.map(add_fn)
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
with tf.Session() as sess:
  sess.run(iterator.initializer, feed_dict={a: A, b: B})
  # just one element at dataset
  x = sess.run(next_element)
  print(x)
Sign up to request clarification or add additional context in comments.

Comments

2

The problem in your more complicated example is that you use the same tf.placeholder() nodes as the input to the Dataset.from_tensor_slices() (which is correct) and the network itself (which causes the InvalidArgumentError. Instead, as J.E.K points out in their answer, you should use iterator.get_next() as the input to your network, as follows (note that there are a couple of other fixes I added to make the code run as-is):

X = np.arange(8.).reshape(4, 2)
Y_true = np.array([0, 0, 1, 1])

x = tf.placeholder(tf.float32, [None, 2], name='x')
y_true = tf.placeholder(tf.float32, [None], name='y_true')

dataset = tf.data.Dataset.from_tensor_slices((x, y_true))

# You will need to repeat the input (which has 4 elements) to be able to take
# 100 steps.
dataset = dataset.repeat()

iterator = dataset.make_initializable_iterator()

# Use `iterator.get_next()` to create tensors that will consume values from the
# dataset.
x_next, y_true_next = iterator.get_next()

w = tf.Variable(np.random.randn(2, 1), name='w', dtype=tf.float32)

# The `x_next` tensor is a vector (i.e. a row of `X`), so you will need to
# convert it to a matrix or apply batching in the dataset to make it work with
# `tf.matmul()`
x_next = tf.expand_dims(x_next, 0)

y = tf.squeeze(tf.matmul(x_next, w), name='y')  # Use `x_next` here.

loss = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(
        labels=y_true_next, logits=y),  # Use `y_true_next` here.
    name='x_entropy')

# set optimiser
optimiser = tf.train.AdamOptimizer().minimize(loss)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    sess.run(iterator.initializer, feed_dict={x: X,
                                              y_true: Y_true})

    for i in range(100):
        _, loss_out = sess.run([optimiser, loss])
        print(i, loss_out)

1 Comment

Great, I think that has got me over the conceptual 'hump' so that I now have some idea of how Dataset works, thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.