Imagine I have some data:
some_data = np.array([[1,2,3,4], [5, 6, 7,8]])
It looks like this:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Each row represents a different observation, so they should not be combined. I want to create a windowed dataset, each window of size 3, shifted by 1. When I pass a single observation, I get what I want, like this:
dataset = tf.data.Dataset.from_tensor_slices(some_data[0])
dataset = dataset.window(size=3, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(3))
The result:
for x in dataset:
print(x.numpy())
[1 2 3]
[2 3 4]
But when I pass the whole numpy array of arrays, I don't get back anything.
dataset = tf.data.Dataset.from_tensor_slices(some_data)
dataset = dataset.window(size=3, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(3))
This is what I would expect:
for x in dataset:
print(x.numpy())
[1 2 3]
[2 3 4]
[5 6 7]
[6 7 8]
I guess I could loop over some_data and pass one array at a time, and then concatenate the datasets, but this seems like a bad solution. What's the right way to do it?
I'm using Tensorflow 2.0. Thanks!