How to convert Tensorflow dataset to 2D numpy array

Question

I have a TensorFlow dataset which contains nearly 15000 multicolored images with 168*84 resolution and a label for each image. Its type and shape are like this:

< ConcatenateDataset shapes: ((168, 84, 3), ()), types: (tf.float32, tf.int32)>

I need to use it to train my network. That's why I need to pass it as a parameter to this function that I built my layers in:

def cnn_model_fn(features, labels, mode):

  input_layer = tf.reshape(features["x"], [-1, 168, 84, 3])
  # Convolutional Layer #1
  conv1 = tf.layers.conv2d(
     inputs=input_layer,
     filters=32,
     kernel_size=[5, 5],
     padding="same",
     activation=tf.nn.relu)
.
.
.

I tried to convert each tensor into np.array(which is the proper type for the function above, i guess) by using tf.eval() and np.ravel(). But I failed.

So, how can I convert this dataset into the proper type to pass it to the function?

Plus

I am new to python and tensorflow and I don't think I understand why there are datasets if we can not use them directly to build layers (I am following the tutorial in TensorFlow's website btw).

Thanks.

Master M · Accepted Answer · 2019-08-22 01:26:09Z

You could try eager execution, previously I gave an answer with session run (showed below).
During eager execution using .numpy() on a tensor will convert that tensor to numpy array.
Example code (from my use case):


    #enable eager execution
    from __future__ import absolute_import, division, print_function, unicode_literals
    import tensorflow as tf
    tf.enable_eager_execution()
    print('Is executing eagerly?',tf.executing_eagerly())      

    #load datasets
    import tensorflow_datasets as tfds
    dataset, metadata = tfds.load('cycle_gan/horse2zebra',
                                  with_info=True, as_supervised=True)
    train_horses, train_zebras = dataset['trainA'], dataset['trainB']

    #load dataset in to numpy array 
    train_A=train_horses.batch(1000).make_one_shot_iterator().get_next()[0].numpy()
    print(train_A.shape)

    #preview one of the images
    import matplotlib.pyplot as plt
    %matplotlib inline
    import numpy as np
    print(train_A.shape)
    plt.imshow(train_A[1])
    plt.show()

Old, session run, answer:

I recently had this problem, and I did it like this:


    #load datasets
    import tf
    import tensorflow_datasets as tfds
    dataset, metadata = tfds.load('cycle_gan/horse2zebra',
                                  with_info=True, as_supervised=True)
    train_horses, train_zebras = dataset['trainA'], dataset['trainB']

    #load dataset in to numpy array
    sess = tf.compat.v1.Session()
    tra=train_horses.batch(1000).make_one_shot_iterator().get_next()
    train_A=np.array(sess.run(tra)[0])
    print(train_A.shape)
    sess.close()

    #preview one of the images
    import matplotlib.pyplot as plt
    %matplotlib inline
    import numpy as np
    print(train_A.shape)
    plt.imshow(train_A[1])
    plt.show()

David Parks · Accepted Answer · 2018-05-19 00:39:45Z

3

It doesn't sound like you set up things using the Tensorflow Dataset pipeline, here is the guide for doing so:

https://www.tensorflow.org/programmers_guide/datasets

You can either follow that (it's the right approach, but there's a small learning curve to get used to it), or you can just pass in the numpy array to sess.run as part of the feed_dict parameter. If you go this way then you should just create a tf.placeholder which will be populated by the value in feed_dict. Many of the basic tutorial examples here follow this approach:

https://github.com/aymericdamien/TensorFlow-Examples

answered May 19, 2018 at 0:39

David Parks

32.4k48 gold badges206 silver badges366 bronze badges

Comments

phemmer · Accepted Answer · 2020-01-03 20:47:18Z

I was also needing to accomplish this task (Dataset to array), but without turning on eager mode. I managed to come up with the following:

dataset = tf.data.Dataset.from_tensor_slices([[1,2],[3,4]])

tensor_array = tf.TensorArray(dtype=dataset.element_spec.dtype,
                              size=0,
                              dynamic_size=True,
                              element_shape=dataset.element_spec.shape)
tensor_array = dataset.reduce(tensor_array, lambda a, t: a.write(a.size(), t))
tensor = tf.reshape(tensor_array.concat(), (-1,)+tuple(dataset.element_spec.shape))
array = tf.Session().run(tensor)

print(type(array))
# <class 'numpy.ndarray'>

print(array)
# [[1 2]
#  [3 4]]

What this does:
We start with a dataset containing 2 tensors of shape (2,).

Since eager is off, we need to run the dataset through a Tensorflow session. And since a session requires a tensor, we have to convert the dataset into a tensor.

To accomplish this, we use Dataset.reduce() to put all the elements into a TensorArray (symbolically).

We now use TensorArray.concat() to convert the whole array into a single tensor. However when we do this the whole dataset becomes flattened into a 1-D array. So we need tf.reshape() to get it back into our original tensor's shape, plus an extra dimension to stack them all.

Finally we take the tensor and run it through a session. This gives us our numpy ndarray.

aicoder · Accepted Answer · 2021-08-16 18:01:11Z

3

This was the simplest method for me for supervised problem with (X, y).

def dataset_to_numpy(ds):
    """
    Convert tensorflow dataset to numpy arrays
    """
    images = []
    labels = []

    # Iterate over a dataset
    for i, (image, label) in enumerate(tfds.as_numpy(ds)):
        images.append(image)
        labels.append(label)

    for i, img in enumerate(images):
        if i < 3:
            print(img.shape, labels[i])

    return images, labels

Usage:

    ds = tfds.load('mnist', split='train', as_supervised=True)

answered Aug 16, 2021 at 18:01

aicoder

92710 silver badges14 bronze badges

Comments

Thibaut Temkeng · Accepted Answer · 2021-10-04 07:50:52Z

1

You can use the following methods to get the images and the corresponding captions:

def separate_dataset(dataset):
    images, labels = tf.compat.v1.data.make_one_shot_iterator(dataset.batch(len(dataset))).get_next()
    return images, labels

answered Oct 4, 2021 at 7:50

Thibaut Temkeng

111 bronze badge

Collectives™ on Stack Overflow

How to convert Tensorflow dataset to 2D numpy array

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related