4

前言

本文基于TensorFlow官网的How-Tos写成。

TensorBoard是TensorFlow自带的一个可视化工具,Embeddings是其中的一个功能,用于在二维或三维空间对高维数据进行探索。

An embedding is a map from input data to points in Euclidean space.

本文使用MNIST数据讲解Embeddings的使用方法。

代码

# -*- coding: utf-8 -*-
# @author: 陈水平
# @date: 2017-02-08
# @description: hello world program to set up embedding projector in TensorBoard based on MNIST
# @ref: http://yann.lecun.com/exdb/mnist/, https://www.tensorflow.org/images/mnist_10k_sprite.png
# 

import numpy as np
import tensorflow as tf
from tensorflow.contrib.tensorboard.plugins import projector
from tensorflow.examples.tutorials.mnist import input_data
import os

PATH_TO_MNIST_DATA = "MNIST_data"
LOG_DIR = "log"
IMAGE_NUM = 10000

# Read in MNIST data by utility functions provided by TensorFlow
mnist = input_data.read_data_sets(PATH_TO_MNIST_DATA, one_hot=False)

# Extract target MNIST image data
plot_array = mnist.test.images[:IMAGE_NUM]  # shape: (n_observations, n_features)

# Generate meta data
np.savetxt(os.path.join(LOG_DIR, 'metadata.tsv'), mnist.test.labels[:IMAGE_NUM], fmt='%d')

# Download sprite image
# https://www.tensorflow.org/images/mnist_10k_sprite.png, 100x100 thumbnails
PATH_TO_SPRITE_IMAGE = os.path.join(LOG_DIR, 'mnist_10k_sprite.png')  

# To visualise your embeddings, there are 3 things you need to do:
# 1) Setup a 2D tensor variable(s) that holds your embedding(s)
session = tf.InteractiveSession()
embedding_var = tf.Variable(plot_array, name='embedding')
tf.global_variables_initializer().run()

# 2) Periodically save your embeddings in a LOG_DIR
# Here we just save the Tensor once, so we set global_step to a fixed number
saver = tf.train.Saver()
saver.save(session, os.path.join(LOG_DIR, "model.ckpt"), global_step=0)

# 3) Associate metadata and sprite image with your embedding
# Use the same LOG_DIR where you stored your checkpoint.
summary_writer = tf.summary.FileWriter(LOG_DIR)

config = projector.ProjectorConfig()
# You can add multiple embeddings. Here we add only one.
embedding = config.embeddings.add()
embedding.tensor_name = embedding_var.name
# Link this tensor to its metadata file (e.g. labels).
embedding.metadata_path = os.path.join(LOG_DIR, 'metadata.tsv')
# Link this tensor to its sprite image.
embedding.sprite.image_path = PATH_TO_SPRITE_IMAGE 
embedding.sprite.single_image_dim.extend([28, 28])
# Saves a configuration file that TensorBoard will read during startup.
projector.visualize_embeddings(summary_writer, config)

首先,从这里下载图片,放到log目录下;然后执行上述代码;最后,执行下面的命令启动TensorBoard。

tensorboard --logdir=log

执行后,命令行会显示如下提示信息:

Starting TensorBoard 39 on port 6006
(You can navigate to http://xx.xxx.xx.xxx:6006)

打开浏览器,输入上面的链接地址,点击导航栏的EMBEDDINGS即可看到效果:

clipboard.png

资源

这篇文章对MNIST的可视化做了深入的研究,非常值得细读。


丹追兵
776 声望357 粉丝

本人年少时在欧洲三国边境小城Aachen游学,瞻仰了两位机械泰斗的风采,然未继承任何技能,终日游手好闲四处转悠。归国后,有感于在机械行业难有建树,遂投身互联网,遇大牛周公,授我以Python和Hadoop大法,立足...