6

I have a TensorFlow dataset which contains nearly 15000 multicolored images with 168*84 resolution and a label for each image. Its type and shape are like this:

< ConcatenateDataset shapes: ((168, 84, 3), ()), types: (tf.float32, tf.int32)>

I need to use it to train my network. That's why I need to pass it as a parameter to this function that I built my layers in:

def cnn_model_fn(features, labels, mode):

  input_layer = tf.reshape(features["x"], [-1, 168, 84, 3])
  # Convolutional Layer #1
  conv1 = tf.layers.conv2d(
     inputs=input_layer,
     filters=32,
     kernel_size=[5, 5],
     padding="same",
     activation=tf.nn.relu)
.
.
.

I tried to convert each tensor into np.array(which is the proper type for the function above, i guess) by using tf.eval() and np.ravel(). But I failed.

So, how can I convert this dataset into the proper type to pass it to the function?

Plus

I am new to python and tensorflow and I don't think I understand why there are datasets if we can not use them directly to build layers (I am following the tutorial in TensorFlow's website btw).

Thanks.

5 Answers 5

5

You could try eager execution, previously I gave an answer with session run (showed below).
During eager execution using .numpy() on a tensor will convert that tensor to numpy array.
Example code (from my use case):


    #enable eager execution
    from __future__ import absolute_import, division, print_function, unicode_literals
    import tensorflow as tf
    tf.enable_eager_execution()
    print('Is executing eagerly?',tf.executing_eagerly())      

    #load datasets
    import tensorflow_datasets as tfds
    dataset, metadata = tfds.load('cycle_gan/horse2zebra',
                                  with_info=True, as_supervised=True)
    train_horses, train_zebras = dataset['trainA'], dataset['trainB']

    #load dataset in to numpy array 
    train_A=train_horses.batch(1000).make_one_shot_iterator().get_next()[0].numpy()
    print(train_A.shape)

    #preview one of the images
    import matplotlib.pyplot as plt
    %matplotlib inline
    import numpy as np
    print(train_A.shape)
    plt.imshow(train_A[1])
    plt.show()

Old, session run, answer:

I recently had this problem, and I did it like this:


    #load datasets
    import tf
    import tensorflow_datasets as tfds
    dataset, metadata = tfds.load('cycle_gan/horse2zebra',
                                  with_info=True, as_supervised=True)
    train_horses, train_zebras = dataset['trainA'], dataset['trainB']

    #load dataset in to numpy array
    sess = tf.compat.v1.Session()
    tra=train_horses.batch(1000).make_one_shot_iterator().get_next()
    train_A=np.array(sess.run(tra)[0])
    print(train_A.shape)
    sess.close()

    #preview one of the images
    import matplotlib.pyplot as plt
    %matplotlib inline
    import numpy as np
    print(train_A.shape)
    plt.imshow(train_A[1])
    plt.show()

Sign up to request clarification or add additional context in comments.

Comments

3

It doesn't sound like you set up things using the Tensorflow Dataset pipeline, here is the guide for doing so:

https://www.tensorflow.org/programmers_guide/datasets

You can either follow that (it's the right approach, but there's a small learning curve to get used to it), or you can just pass in the numpy array to sess.run as part of the feed_dict parameter. If you go this way then you should just create a tf.placeholder which will be populated by the value in feed_dict. Many of the basic tutorial examples here follow this approach:

https://github.com/aymericdamien/TensorFlow-Examples

Comments

3

I was also needing to accomplish this task (Dataset to array), but without turning on eager mode. I managed to come up with the following:

dataset = tf.data.Dataset.from_tensor_slices([[1,2],[3,4]])

tensor_array = tf.TensorArray(dtype=dataset.element_spec.dtype,
                              size=0,
                              dynamic_size=True,
                              element_shape=dataset.element_spec.shape)
tensor_array = dataset.reduce(tensor_array, lambda a, t: a.write(a.size(), t))
tensor = tf.reshape(tensor_array.concat(), (-1,)+tuple(dataset.element_spec.shape))
array = tf.Session().run(tensor)

print(type(array))
# <class 'numpy.ndarray'>

print(array)
# [[1 2]
#  [3 4]]

What this does:
We start with a dataset containing 2 tensors of shape (2,).

Since eager is off, we need to run the dataset through a Tensorflow session. And since a session requires a tensor, we have to convert the dataset into a tensor.

To accomplish this, we use Dataset.reduce() to put all the elements into a TensorArray (symbolically).

We now use TensorArray.concat() to convert the whole array into a single tensor. However when we do this the whole dataset becomes flattened into a 1-D array. So we need tf.reshape() to get it back into our original tensor's shape, plus an extra dimension to stack them all.

Finally we take the tensor and run it through a session. This gives us our numpy ndarray.

Comments

3

This was the simplest method for me for supervised problem with (X, y).

def dataset_to_numpy(ds):
    """
    Convert tensorflow dataset to numpy arrays
    """
    images = []
    labels = []

    # Iterate over a dataset
    for i, (image, label) in enumerate(tfds.as_numpy(ds)):
        images.append(image)
        labels.append(label)

    for i, img in enumerate(images):
        if i < 3:
            print(img.shape, labels[i])

    return images, labels

Usage:

    ds = tfds.load('mnist', split='train', as_supervised=True)

Comments

1

You can use the following methods to get the images and the corresponding captions:

def separate_dataset(dataset):
    images, labels = tf.compat.v1.data.make_one_shot_iterator(dataset.batch(len(dataset))).get_next()
    return images, labels

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.