图像风格转移

介绍

图片描述

什么是图像风格迁移，我想图片比文字更具表现力。上图中“output image”就是图像风格迁移后得到的结果。那么它是如何实现的呢？首先让我们看下CNN每层学习到了什么。

图片描述

如图所示，CNN网络最开始会学习到图像的“纹理”，“边缘“等信息，随着层数的加深将会学习到更加丰富的信息。其实，在图像风格转移中我们就是使用卷积的前几层作为图像的”风格“。至于”content image“方法一样，只不过我们使用较高的层作为输出。
显而易见，我们需要有一个强大的CNN网络用来提取特征，为此，我们利用迁移学习使用VGG19模型。有关迁移学习，VGG16模型介绍，可以查看通过迁移学习实现OCT图像识别这篇文章。

将”style image“,"content image","init image(要生成的目标图像)"输入VGG19网络,提取特征构建模型。分别计算”content loss“，”style loss“并与系数相乘，然后将两个损失相加得到总损失。得到总损失后就可以计算对”init image“的梯度，然后使用梯度下降更新。

项目的细节要求，将会在对应代码里介绍。这里极力推荐使用”google colab“，当然，前提是”科学上网“。

数据处理

加载图片：

import os

img_dir='/tmp/nst'

if not os.path.exists(img_dir):
    os.makedirs(img_dir)

import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.figsize']=(10,10)
mpl.rcParams['axes.grid']=False
import numpy as np
from PIL import Image
import time
import functools
import tensorflow as tf
import tensorflow.contrib.eager as tfe
from tensorflow.python.keras.preprocessing import image as kp_image
from tensorflow.python.keras import models
from tensorflow.python.keras import losses
from tensorflow.python.keras import layers
from tensorflow.python.keras import backend as K

# 开启eager模式，开启后不能关闭
tf.enable_eager_execution()

# content图片路径
content_path='/tmp/nst/Green_Sea_Turtle_grazing_seagrass.jpg'
# style图片路径
style_path='/tmp/nst/The_Great_Wave_off_Kanagawa.jpg'

def load_img(path_to_img):
    max_dim=512
    img=Image.open(path_to_img)

    # img.size:
    # return:width,height
    long=max(img.size)

    # 缩放比
    scale=max_dim/long

    # img.size[0]:width img.size[1]:height
    # round:返回四舍五入的值
    # Image.ANTIALIAS:抗锯齿
    img=img.resize((round(img.size[0]*scale),round(img.size[1]*scale)),Image.ANTIALIAS)

    img=kp_image.img_to_array(img)

    # expand dim:batch_size
    # axis:对于2维来说，0:列，1：行，对于大于2维来说：维度从外向里加，如5维度：0，1，2，3，4
    img=np.expand_dims(img,axis=0)

    return img

显示照片：

def imgshow(img,title=None):
    # load_img fn:增加了batch_size 维度
    # 这里显示照片不需要此维度
    out=tf.squeeze(img,axis=0)

    out=out.astype('uint8')

    if title is not None:
        plt.title(title)
    plt.imshow(out)

    # 显示content图像和style图像

    plt.figure(figsize=(10,10))
    content_img=load_img(content_path).astype('uint8')
    style_img=load_img(style_path).astype('uint8')

    plt.subplot(1,2,1)
    imgshow(content_img,'content_img')

    plt.subplot(1,2,2)
    imgshow(style_img)

    plt.show()

将图片转为适合VGG19的输入格式：

def load_and_process_img(img_path):
    img=load_img(img_path)
    # vgg提供的预处理，主要完成（1）去均值（2）RGB转BGR（3）维度调换三个任务。
    img=tf.keras.applications.vgg19.preprocess_input(img)

    return img

将图片由BGR转到RGB并将像素值限制到[0，255]：

def deprocess_img(processed_img):
    x=processed_img.copy()

    if len(x.shape) == 4:
        x=np.squeeze(x,0)
    assert len(x.shape) == 3
    # 如果是RGB转BGR，此处改为”-=“
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    # 'BGR'->'RGB'
    x = x[:, :, ::-1]

    x=np.clip(x,a_min=0,a_max=255).astype('uint8')

    return x

创建模型

指定使用VGG19模型中的哪些层作为”content image“特征层，”style image“特征层，并以此来构建新模型。

# content层
content_layers=['block5_conv2']

# style层
style_layers=[
    'block1_conv1',
    'block2_conv1',
    'block3_conv1',
    'block4_conv1',
    'block5_conv1'
]

num_content_layers=len(content_layers)
num_style_layers=len(style_layers)

# 创建模型

# 使用vgg19中间层作为模型输出
def get_model():
    vgg=tf.keras.applications.vgg19.VGG19(
        # 不使用最后全连接层
        include_top=False,
        # 使用imagenet数据集
        weights='imagenet'
    )
    # 因为vgg19我们仅是用来提取特征
    vgg.trainable=False

    # 获取对应层输出
    style_outputs=[ vgg.get_layer(name).output for name in style_layers]
    content_outputs=[ vgg.get_layer(name).output for name in content_layers]

    model_outputs=style_outputs+content_outputs

    return models.Model(vgg.input,model_outputs)

损失函数

content loss：

模型的”content loss“就是输入图像X和原始图像P之间的欧氏距离，损失函数如下图所示：

style loss：

我们将l层第i个feature map和第j个feature map的内积，表示模型提取的”风格特征“，然后依然使用欧氏距离来计算损失。
一层损失计算：

我们的”style loss“，一般具有多层，所以总”style loss“需要累加：

总损失：
模型总损失就是”content loss“与“style loss”相加：

# content 损失
def get_content_loss(base_content,target):

    # 欧式距离计算损失
    return tf.reduce_mean(tf.square(base_content - target))

# style 损失
# 使用gram矩阵来表示风格特征
def gram_matrix(input_tensor):
    # (batch_size,height,width,channel)
    channels=int(input_tensor.shape[-1])
    a=tf.reshape(input_tensor,shape=[-1,channels])
    n=tf.shape(a)[0]
    gram=tf.matmul(a=a,b=a,transpose_a=True)

    return gram/tf.cast(n,tf.float32)
    
def get_style_loss(base_style,gram_target):
    gram_style=gram_matrix(base_style)
    
    # 欧氏距离计算损失
    return tf.reduce_mean(tf.square(gram_style - gram_target))

计算损失函数，自然需要获取模型输出，下面获取“content output”和“style output”：

def get_feature_representtations(model,content_path,style_path):

    # 将content img，style img 转为适合VGG19的输入
    content_img=load_and_process_img(content_path)
    style_img=load_and_process_img(style_path)

    # 创建content，style模型
    content_outputs=model(style_img)
    style_outputs=model(content_img)

    # model output feature 注意此处取值区间
    # model output == content out + style out
    content_features=[ content_layer[0] for content_layer in content_outputs[num_style_layers:]]
    style_features=[ style_layer[0] for style_layer in style_outputs[:num_style_layers]]

    return content_features,style_features

梯度计算

def compute_loss(model,loss_weight,init_image,gram_style_features,content_features):
    # “style image”损失函数系数，“content image”损失函数系数
    # 此系数的作用是让“output image”内容更像谁一些，比如：
    # content image系数更大，那么“output image”内容与“content image”相似度更高
    style_weight,content_weight=loss_weight
    
    # 将“init image”输入VGG19模型，得到“init_image_output features”
    model_outputs=model(init_image)
    
    # 根据上面的设置获取对应区间层的“feature output”
    style_output_features=model_outputs[:num_style_layers]
    content_output_features=model_outputs[num_style_layers:]
    
    # “style image” loss
    style_score=0
    
    # “content image” loss
    content_score=0
    
    # 先计算每层损失，并设定每层的损失权重相同（当然，可以设置每层权重不同值）
    
    # 设定每层损失权重相同
    weight_per_style_layer=1.0/float(num_style_layers)
    
    # 累加每层损失
    for target_style,comb_style in zip(gram_style_features,style_output_features):
        style_score+=weight_per_style_layer*get_style_loss(comb_style[0],target_style)
        
    # 与“style_score”损失同理
    weight_per_content_layer=1.0/float(num_content_layers)
    for target_content,comb_content in zip(content_features,content_output_features):
        content_score+=weight_per_content_layer*get_content_loss(comb_content[0],target_content)
  
  # 损失函数*对应系数   
  style_score *= style_weight
  content_score *= content_weight

  # 相加得到总损失
  loss = style_score + content_score 
  return loss, style_score, content_score

def compute_grads(cfg):
    # eager模式下，先记录
    with tf.GradientTape() as tape:
        # 参数输入形式是字典
        all_loss=compute_loss(**cfg)
    total_loss=all_loss[0]
    return tape.gradient(total_loss,cfg['init_image']),all_loss

模型训练

import IPython.display

def run_style_transfer(content_path, 
                       style_path,
                       num_iterations=1000,
                       content_weight=1e3, 
                       style_weight=1e-2): 
  
  # 此处我们的模型主要是用来提取特征，做损失函数
  model = get_model() 
  for layer in model.layers:
    layer.trainable = False
  
  #  获取模型“style feature”和“content feature”，注意此函数的取值区间
  style_features, content_features = get_feature_representations(model, content_path, style_path)
  
  # 将“style feature”转为可用于计算损失的gram矩阵形式
  gram_style_features = [gram_matrix(style_feature) for style_feature in style_features]
  
  # 目标图像设置，此处使用的是“content image”
  # 此图像的初始化对结果影响不大
  init_image = load_and_process_img(content_path)
  
  # eager模式下变量使用“tfe.Variable”
  init_image = tfe.Variable(init_image, dtype=tf.float32)
  
  # 优化器设置
  # beta1：一阶矩估计的指数衰减率
  opt = tf.train.AdamOptimizer(learning_rate=5, beta1=0.99, epsilon=1e-1)
  
  # 初始化模型结果
  # float('inf') 正无穷 float('-inf')负无穷
  best_loss, best_img = float('inf'), None
  
  # 损失函数参数配置
  loss_weights = (style_weight, content_weight)
  cfg = {
      'model': model,
      'loss_weights': loss_weights,
      'init_image': init_image,
      'gram_style_features': gram_style_features,
      'content_features': content_features
  }
    
  # 设置训练结果
  num_rows = 2
  num_cols = 5
  display_interval = num_iterations/(num_rows*num_cols)
  start_time = time.time()
  global_start = time.time()
  
  norm_means = np.array([103.939, 116.779, 123.68])
  min_vals = -norm_means
  max_vals = 255 - norm_means   
  
  imgs = []
  for i in range(num_iterations):
    
    # 梯度计算及参数更新
    grads, all_loss = compute_grads(cfg)
    loss, style_score, content_score = all_loss
    
    opt.apply_gradients([(grads, init_image)])
    clipped = tf.clip_by_value(init_image, min_vals, max_vals)
    init_image.assign(clipped)
    end_time = time.time() 
    
    if loss < best_loss:
      # 损失更新
      best_loss = loss
      # 转为RGB显示
      best_img = deprocess_img(init_image.numpy())

    if i % display_interval== 0:
      start_time = time.time()
      
      # 显示训练过程
      plot_img = init_image.numpy()
      
      # 转为RGB显示
      plot_img = deprocess_img(plot_img)
      
      imgs.append(plot_img)
      IPython.display.clear_output(wait=True)
      IPython.display.display_png(Image.fromarray(plot_img))
      print('Iteration: {}'.format(i))        
      print('Total loss: {:.4e}, ' 
            'style loss: {:.4e}, '
            'content loss: {:.4e}, '
            'time: {:.4f}s'.format(loss, style_score, content_score, time.time() - start_time))
  print('Total time: {:.4f}s'.format(time.time() - global_start))
  IPython.display.clear_output(wait=True)
  plt.figure(figsize=(14,4))
  for i,img in enumerate(imgs):
      plt.subplot(num_rows,num_cols,i+1)
      plt.imshow(img)
      plt.xticks([])
      plt.yticks([])
      
  return best_img, best_loss

训练结果展示：

best, best_loss = run_style_transfer(content_path, 
                                     style_path, num_iterations=1000)
Image.fromarray(best)

图片描述

总结

我们利用迁移学习使用VGG19模型提取“style feature”和“content feature”，都使用欧氏距离计算损失函数。其中，使用gram矩阵计算“style loss”。
最近开始使用”google colab“训练模型，感觉不错，推荐给大家。

本文代码部分来自Raymond Yuan，在次表示感谢。

图像风格转移

介绍

数据处理

创建模型

损失函数

梯度计算

模型训练

总结

醇岩

引用和评论

语义分割浅析

Anaconda安装教程以及Anaconda和pip配置国内镜像

科学计算编程涉及到的技术栈简介

使用 chardet 判断文件编码需要注意的坑——过大的文件会导致高耗时

Python3 格式化时间（qbit）

本地使用PaddleOCR进行图片识别获得文字（返回JSON）

manus 的替代品有哪些？使用LLM大模型技术做手机/网页/浏览器自动化操作技术汇总