Tensorflow: how to batch with a dataset constructed with numpy arrays?

Question

I am trying to understand the behavior of Dataset.batch. Here is the code I have used to try to set up iterators on batched data through a Dataset based on numpy arrays.

    ## experiment with a numpy dataset
    sample_size = 100000
    ncols = 15
    batch_size = 1000
    xarr = np.ones([sample_size, ncols]) * [i for i in range(ncols)]
    xarr = xarr + np.random.normal(scale = 0.5, size = xarr.shape)
    yarr = np.sum(xarr, axis = 1)
    self.x_placeholder = tf.placeholder(xarr.dtype, [None, ncols])
    self.y_placeholder = tf.placeholder(yarr.dtype, [None, 1])

    dataset = tf.data.Dataset.from_tensor_slices((self.x_placeholder, self.y_placeholder))
    dataset.batch(batch_size)
    self.iterator  = dataset.make_initializable_iterator()

    X, y  = self.iterator.get_next()

However, when I check the shapes of X and y they are

(Pdb) X.shape
TensorShape([Dimension(15)])
(Pdb) y.shape
TensorShape([Dimension(1)])

This is confusing to me because it does not appear that my batch size has been taken into account. It also causes problems downstream when building a model because I expect X and y to have two dimensions, the first dimension being the number of examples in the batch.

Question: Why are the outputs of the iterator one dimensional? How should I batch properly?

Here is what I have tried:

The shapes of X and y are the same regardless of whether I apply the batch function to the dataset.
Changing the shape I feed into the placeholders (say by replacing None with batch_size) does not change the behavior either.

Thanks for suggestions/corrections, etc.

DMolony · Accepted Answer · 2019-05-23 00:44:13Z

1

In order to take batch size into account you need to change the following

dataset.batch(batch_size)

to

dataset = dataset.batch(batch_size)

answered May 23, 2019 at 0:44

DMolony

6434 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Tensorflow: how to batch with a dataset constructed with numpy arrays?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related