How to get a windowed dataset in tensorflow 2 from an array of numpy arrays?

Question

Imagine I have some data:

some_data = np.array([[1,2,3,4], [5, 6, 7,8]])

It looks like this:

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Each row represents a different observation, so they should not be combined. I want to create a windowed dataset, each window of size 3, shifted by 1. When I pass a single observation, I get what I want, like this:

dataset = tf.data.Dataset.from_tensor_slices(some_data[0])
dataset = dataset.window(size=3, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(3))

The result:

for x in dataset:
    print(x.numpy())

[1 2 3]
[2 3 4]

But when I pass the whole numpy array of arrays, I don't get back anything.

dataset = tf.data.Dataset.from_tensor_slices(some_data)
dataset = dataset.window(size=3, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(3))

This is what I would expect:

for x in dataset:
    print(x.numpy())

[1 2 3]
[2 3 4]
[5 6 7]
[6 7 8]

I guess I could loop over some_data and pass one array at a time, and then concatenate the datasets, but this seems like a bad solution. What's the right way to do it?

I'm using Tensorflow 2.0. Thanks!

What's output do you expect?

giser_yugang
– giser_yugang

2019-09-27 01:23:49 +00:00
Commented Sep 27, 2019 at 1:23 — giser_yugang
– giser_yugang, Commented Sep 27, 2019 at 1:23
I updated the question with the expected output. Thanks!

rv123
– rv123

2019-09-27 16:43:24 +00:00
Commented Sep 27, 2019 at 16:43 — rv123
– rv123, Commented Sep 27, 2019 at 16:43

giser_yugang · Accepted Answer · 2019-09-29 04:11:47Z

Each row of dataset has only one element when you use dataset = tf.data.Dataset.from_tensor_slices(some_data[0]).

dataset = tf.data.Dataset.from_tensor_slices(some_data[0])
for x in dataset:
    print(x.numpy())
1
2
3
4

But each row of dataset has four elements when you use dataset = tf.data.Dataset.from_tensor_slices(some_data).

dataset = tf.data.Dataset.from_tensor_slices(some_data)
for x in dataset:
    print(x.numpy())
[1 2 3 4]
[5 6 7 8]

So what you need to do is convert each row and merge it.

import numpy as np
import tensorflow as tf

some_data = np.array([[1,2,3,4], [5, 6, 7,8]])
dataset = tf.data.Dataset.from_tensor_slices(some_data)

def parse_samples(x):
    return tf.data.Dataset.from_tensor_slices(x)\
        .window(size=3, shift=1, drop_remainder=True)\
        .flat_map(lambda window: window.batch(3))

dataset = dataset.flat_map(parse_samples)

for x in dataset:
    print(x.numpy())

[1 2 3]
[2 3 4]
[5 6 7]
[6 7 8]

Collectives™ on Stack Overflow

How to get a windowed dataset in tensorflow 2 from an array of numpy arrays?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related