How to use an autoencoder to visualize dimensionality reduction? (Python | TensorFlow)

Question

I'm trying to adapt Aymeric Damien's code to visualize the dimensionality reduction performed by an autoencoder implemented in TensorFlow. All of the examples I have seen work on the mnist digits dataset but I wanted to use this method to visualize the iris dataset in 2 dimensions as a toy example so I can figure out how to tweak it for my real-world datasets.

My question is: How can one get the sample-specific 2 dimensional embeddings to visualize?

For example, the iris dataset has 150 samples with 4 attributes. I added 4 noise attributes to get a total of 8 attributes. The encoding/decoding follows: [8, 4, 2, 4, 8] but I'm not sure how to extract an array of shape (150, 2) to visualize the embeddings. I haven't found any tutorials on how to visualize the dimensionality reduction using TensorFlow.

from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline

# Set random seeds
np.random.seed(0)
tf.set_random_seed(0)

# Load data
iris = load_iris()
# Original Iris : (150,4)
X_iris = iris.data 
# Iris with noise : (150,8)
X_iris_with_noise = np.concatenate([X_iris, np.random.random(size=X_iris.shape)], axis=1).astype(np.float32)
y_iris = iris.target

# PCA
pca_xy = PCA(n_components=2).fit_transform(X_iris_with_noise)
with plt.style.context("seaborn-white"):
    fig, ax = plt.subplots()
    ax.scatter(pca_xy[:,0], pca_xy[:,1], c=y_iris, cmap=plt.cm.Set2)
    ax.set_title("PCA | Iris with noise")

# Training Parameters
learning_rate = 0.01
num_steps = 1000
batch_size = 10

display_step = 250
examples_to_show = 10

# Network Parameters
num_hidden_1 = 4 # 1st layer num features
num_hidden_2 = 2 # 2nd layer num features (the latent dim)
num_input = 8 # Iris data input 

# tf Graph input
X = tf.placeholder(tf.float32, [None, num_input], name="input")

weights = {
    'encoder_h1': tf.Variable(tf.random_normal([num_input, num_hidden_1]), dtype=tf.float32, name="encoder_h1"),
    'encoder_h2': tf.Variable(tf.random_normal([num_hidden_1, num_hidden_2]), dtype=tf.float32, name="encoder_h2"),
    'decoder_h1': tf.Variable(tf.random_normal([num_hidden_2, num_hidden_1]), dtype=tf.float32, name="decoder_h1"),
    'decoder_h2': tf.Variable(tf.random_normal([num_hidden_1, num_input]), dtype=tf.float32, name="decoder_h2"),
}
biases = {
    'encoder_b1': tf.Variable(tf.random_normal([num_hidden_1]), dtype=tf.float32, name="encoder_b1"),
    'encoder_b2': tf.Variable(tf.random_normal([num_hidden_2]), dtype=tf.float32, name="encoder_b2"),
    'decoder_b1': tf.Variable(tf.random_normal([num_hidden_1]), dtype=tf.float32, name="decoder_b1"),
    'decoder_b2': tf.Variable(tf.random_normal([num_input]), dtype=tf.float32, name="decoder_b2"),
}

# Building the encoder
def encoder(x):
    # Encoder Hidden layer with sigmoid activation #1
    layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['encoder_h1']),
                                   biases['encoder_b1']))
    # Encoder Hidden layer with sigmoid activation #2
    layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['encoder_h2']),
                                   biases['encoder_b2']))
    return layer_2


# Building the decoder
def decoder(x):
    # Decoder Hidden layer with sigmoid activation #1
    layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['decoder_h1']),
                                   biases['decoder_b1']))
    # Decoder Hidden layer with sigmoid activation #2
    layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['decoder_h2']),
                                   biases['decoder_b2']))
    return layer_2

# Construct model
encoder_op = encoder(X)
decoder_op = decoder(encoder_op)

# Prediction
y_pred = decoder_op
# Targets (Labels) are the input data.
y_true = X

# Define loss and optimizer, minimize the squared error
loss = tf.reduce_mean(tf.pow(y_true - y_pred, 2))
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss)

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Start Training
# Start a new TF session
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    # Training
    for i in range(1, num_steps+1):
        # Prepare Data
        # Get the next batch of Iris data 
        idx_train = np.random.RandomState(i).choice(np.arange(X_iris_with_noise.shape[0]), size=batch_size)
        batch_x = X_iris_with_noise[idx_train,:]
        # Run optimization op (backprop) and cost op (to get loss value)
        _, l = sess.run([optimizer, loss], feed_dict={X: batch_x})
        # Display logs per step
        if i % display_step == 0 or i == 1:
            print('Step %i: Minibatch Loss: %f' % (i, l))

I use t-SNE with SciKit Learn but I dont know how to code one in tensorflow and I wanted to see how it works for generating the 2d embeddings. going to try the code below when I get to my computer in a few hrs . do you know any tutorials of t-sne in tf? — O.rka
– O.rka, Commented Oct 23, 2017 at 7:11
Thank you so much for linking that. I've been looking for a notebook that has t-SNE in TensorFlow for a while. — O.rka
– O.rka, Commented Oct 23, 2017 at 18:19
It's a little difficult to follow because there is little documentation. Do you know why the author chose 4 hidden layers, ReLu activations, and such large numbers of neurons for the hidden layers? — O.rka
– O.rka, Commented Oct 23, 2017 at 23:33

Anthony D'Amato · Accepted Answer · 2017-10-23 02:11:07Z

1

Your embedding is accessible with h = encoder(X). Then, for each batch you can get the value as follow:

_, l, embedding = sess.run([optimizer, loss, h], feed_dict={X: batch_x})

There is an even nicer solution with TensorBoard using Embeddings Visualization (https://www.tensorflow.org/programmers_guide/embedding):

from tensorflow.contrib.tensorboard.plugins import projector
config = projector.ProjectorConfig()

embedding = config.embeddings.add()
embedding.tensor_name = h.name

# Use the same LOG_DIR where you stored your checkpoint.
summary_writer = tf.summary.FileWriter(LOG_DIR)

projector.visualize_embeddings(summary_writer, config)

answered Oct 23, 2017 at 2:11

Anthony D'Amato

7581 gold badge6 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

O.rka Over a year ago

Thanks! that makes sense to pull out the embeddings right there. Haha, definitely going to need to tweak some things autoencoder plot but this will help get me started

Collectives™ on Stack Overflow

How to use an autoencoder to visualize dimensionality reduction? (Python | TensorFlow)

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related