0

I am trying to understand the behavior of Dataset.batch. Here is the code I have used to try to set up iterators on batched data through a Dataset based on numpy arrays.

    ## experiment with a numpy dataset
    sample_size = 100000
    ncols = 15
    batch_size = 1000
    xarr = np.ones([sample_size, ncols]) * [i for i in range(ncols)]
    xarr = xarr + np.random.normal(scale = 0.5, size = xarr.shape)
    yarr = np.sum(xarr, axis = 1)
    self.x_placeholder = tf.placeholder(xarr.dtype, [None, ncols])
    self.y_placeholder = tf.placeholder(yarr.dtype, [None, 1])

    dataset = tf.data.Dataset.from_tensor_slices((self.x_placeholder, self.y_placeholder))
    dataset.batch(batch_size)
    self.iterator  = dataset.make_initializable_iterator()

    X, y  = self.iterator.get_next()

However, when I check the shapes of X and y they are

(Pdb) X.shape
TensorShape([Dimension(15)])
(Pdb) y.shape
TensorShape([Dimension(1)])

This is confusing to me because it does not appear that my batch size has been taken into account. It also causes problems downstream when building a model because I expect X and y to have two dimensions, the first dimension being the number of examples in the batch.

Question: Why are the outputs of the iterator one dimensional? How should I batch properly?

Here is what I have tried:

  • The shapes of X and y are the same regardless of whether I apply the batch function to the dataset.
  • Changing the shape I feed into the placeholders (say by replacing None with batch_size) does not change the behavior either.

Thanks for suggestions/corrections, etc.

1 Answer 1

1

In order to take batch size into account you need to change the following

dataset.batch(batch_size)

to

dataset = dataset.batch(batch_size)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.