4

Imagine I have some data:

some_data = np.array([[1,2,3,4], [5, 6, 7,8]])

It looks like this:

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Each row represents a different observation, so they should not be combined. I want to create a windowed dataset, each window of size 3, shifted by 1. When I pass a single observation, I get what I want, like this:

dataset = tf.data.Dataset.from_tensor_slices(some_data[0])
dataset = dataset.window(size=3, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(3))

The result:

for x in dataset:
    print(x.numpy())

[1 2 3]
[2 3 4]

But when I pass the whole numpy array of arrays, I don't get back anything.

dataset = tf.data.Dataset.from_tensor_slices(some_data)
dataset = dataset.window(size=3, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(3))

This is what I would expect:

for x in dataset:
    print(x.numpy())

[1 2 3]
[2 3 4]
[5 6 7]
[6 7 8]

I guess I could loop over some_data and pass one array at a time, and then concatenate the datasets, but this seems like a bad solution. What's the right way to do it?

I'm using Tensorflow 2.0. Thanks!

2
  • What's output do you expect? Commented Sep 27, 2019 at 1:23
  • I updated the question with the expected output. Thanks! Commented Sep 27, 2019 at 16:43

1 Answer 1

2

Each row of dataset has only one element when you use dataset = tf.data.Dataset.from_tensor_slices(some_data[0]).

dataset = tf.data.Dataset.from_tensor_slices(some_data[0])
for x in dataset:
    print(x.numpy())
1
2
3
4

But each row of dataset has four elements when you use dataset = tf.data.Dataset.from_tensor_slices(some_data).

dataset = tf.data.Dataset.from_tensor_slices(some_data)
for x in dataset:
    print(x.numpy())
[1 2 3 4]
[5 6 7 8]

So what you need to do is convert each row and merge it.

import numpy as np
import tensorflow as tf

some_data = np.array([[1,2,3,4], [5, 6, 7,8]])
dataset = tf.data.Dataset.from_tensor_slices(some_data)

def parse_samples(x):
    return tf.data.Dataset.from_tensor_slices(x)\
        .window(size=3, shift=1, drop_remainder=True)\
        .flat_map(lambda window: window.batch(3))

dataset = dataset.flat_map(parse_samples)

for x in dataset:
    print(x.numpy())

[1 2 3]
[2 3 4]
[5 6 7]
[6 7 8]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.