利用phpy实现 PHP 编写 Vision Transformer (ViT) 模型

背景

在深度学习的世界中，Vision Transformer (ViT) 模型因其在图像分类任务中的卓越表现而受到广泛关注。然而，ViT 模型通常使用 Python 编写，尤其是基于 PyTorch 框架的实现。对于 PHP 开发者来说，利用 PHP 来实现 ViT 模型可能看似不切实际，但借助 phpy 扩展，我们可以轻松地在 PHP 中调用 Python 的生态系统，从而实现这一目标。

什么是 `phpy`？

phpy 是一个 PHP 扩展，允许 PHP 调用 Python 模块。这意味着我们可以在 PHP 中使用 Python 的强大功能和库，而不需要切换编程语言。这对于需要在 PHP 项目中使用深度学习模型的开发者来说，具有重要意义。

安装方法: phpy入门：让PHP平滑使用Python的生态

PHP 实现 Vision Transformer 的代码

让我们首先来看一下如何在 PHP 中实现 ViT 模型。以下是一个完整的 PHP 实现，利用了 phpy 扩展来调用 PyTorch：

<?php

class ViT
{
    private $emb_size;
    private $patch_size;
    private $patch_count;
    private $conv;
    private $patch_emb;
    private $cls_token;
    private $pos_emb;
    private $tranformer_enc;
    private $cls_linear;
    private $torch; // 存储导入的torch模块
    private $nn;    // 存储导入的nn模块

    public function __construct($emb_size = 16)
    {
        $this->torch = PyCore::import('torch');
        $this->nn = PyCore::import('torch.nn');
        $this->emb_size = $emb_size;
        $this->patch_size = 4;
        $this->patch_count = intdiv(28, $this->patch_size);  // 7
        $this->conv = $this->nn->Conv2d(
            in_channels: 1,
            out_channels: pow($this->patch_size, 2),
            kernel_size: $this->patch_size,
            padding: 0,
            stride: $this->patch_size,
        );
        $this->patch_emb = $this->nn->Linear(pow($this->patch_size, 2), $this->emb_size);
        $this->cls_token = $this->torch->randn([1, 1, $this->emb_size]);
        $this->pos_emb = $this->torch->randn([1, pow($this->patch_count, 2) + 1, $this->emb_size]);
        $encoder_layer = $this->nn->TransformerEncoderLayer($this->emb_size, 2,
            dim_feedforward: 2 * $this->emb_size,
            dropout: 0.1,
            activation: 'relu',
            layer_norm_eps: 1e-5,
            batch_first: true
        );
        $this->tranformer_enc = $this->nn->TransformerEncoder($encoder_layer, 3);
        $this->cls_linear = $this->nn->Linear($this->emb_size, 10);
    }

    public function forward($x)
    {
        $operator = PyCore::import('operator');
        $x = $this->conv->forward($x);
        $batch_size = $x->size(0);
        $out_channels = $x->size(1);
        $height = $x->size(2);
        $width = $x->size(3);
        $x = $x->view($batch_size, $out_channels, $height * $width);
        $x = $x->permute([0, 2, 1]);
        $x = $this->patch_emb->forward($x);
        $cls_token = $this->cls_token->expand([$x->size(0), 1, $x->size(2)]);
        $x = $this->torch->cat([$cls_token, $x], 1);
        $x = $operator->__add__($x, $this->pos_emb);
        $x = $this->tranformer_enc->forward($x);
        return $this->cls_linear->forward($x->select(1, 0));
    }
}

$torch = PyCore::import('torch');
$vit = new ViT();
$x = $torch->rand(5, 1, 28, 28);
$y = $vit->forward($x);
PyCore::print($y);

PHP 代码分析与 Python 对比

在上面的代码中，phpy 的 PyCore::import() 方法使我们能够在 PHP 中导入和使用 PyTorch 模块，例如 torch 和 torch.nn。这一点使得 PHP 可以直接调用 Python 代码，从而实现复杂的深度学习模型。

例如，在构建 ViT 模型的过程中，我们在 PHP 中使用了 Python 的 torch.nn.Conv2d 和 torch.nn.Linear 等模块，这些模块在深度学习模型的构建中起到了至关重要的作用。

对比 vit.py 中的 Python 代码，PHP 版本的代码结构和逻辑几乎完全相同。这归功于 phpy 让我们能够在 PHP 中直接使用 Python 的语法和模块，实现了代码的高度一致性和可移植性。

from torch import nn
import torch

class ViT(nn.Module):
    def __init__(self, emb_size=16):
        super().__init__()
        self.patch_size = 4
        self.patch_count = 28 // self.patch_size
        self.conv = nn.Conv2d(in_channels=1, out_channels=self.patch_size ** 2, kernel_size=self.patch_size, padding=0,
                              stride=self.patch_size)
        self.patch_emb = nn.Linear(in_features=self.patch_size ** 2, out_features=emb_size)
        self.cls_token = nn.Parameter(torch.rand(1, 1, emb_size))
        self.pos_emb = nn.Parameter(torch.rand(1, self.patch_count ** 2 + 1, emb_size))
        self.tranformer_enc = nn.TransformerEncoder(
            nn.TransformerEncoderLayer(d_model=emb_size, nhead=2, batch_first=True), num_layers=3)
        self.cls_linear = nn.Linear(in_features=emb_size, out_features=10)

    def forward(self, x):
        x = self.conv(x)
        x = x.view(x.size(0), x.size(1), self.patch_count ** 2)
        x = x.permute(0, 2, 1)
        x = self.patch_emb(x)
        cls_token = self.cls_token.expand(x.size(0), 1, x.size(2))
        x = torch.cat((cls_token, x), dim=1)
        x = self.pos_emb + x
        y = self.tranformer_enc(x)
        return self.cls_linear(y[:, 0, :])

if __name__ == '__main__':
    vit = ViT()
    x = torch.rand(5, 1, 28, 28)
    y = vit(x)
    print(y.shape)

`phpy` 的应用场景与意义

在实际开发中，PHP 常用于 Web 开发，但缺乏原生的深度学习支持。这使得在 PHP 项目中实现复杂的机器学习模型成为一项挑战。然而，借助 phpy，我们可以直接调用 Python 的深度学习框架（如 PyTorch 和 TensorFlow），从而将复杂的 AI 算法无缝集成到 PHP 应用中。

这不仅扩展了 PHP 的应用范围，也为 PHP 开发者提供了更多的可能性。例如，在构建需要实时预测或复杂计算的 Web 应用时，PHP 开发者可以直接利用 Python 的丰富生态系统，而不必重新实现这些算法。

结语

通过 phpy，我们可以轻松地在 PHP 中实现 Python 的深度学习模型，如 ViT。这不仅展示了 PHP 的灵活性，也为 PHP 开发者打开了通往深度学习领域的大门。在未来，随着 AI 技术的不断发展，PHP 与 Python 的结合将为开发者带来更多创新的机会。

作者: 汤青松
日期:2024年9月3日
微信: songboy8888

利用phpy实现 PHP 编写 Vision Transformer (ViT) 模型

背景

什么是 `phpy`？

PHP 实现 Vision Transformer 的代码

PHP 代码分析与 Python 对比

`phpy` 的应用场景与意义

结语

汤青松

引用和评论

Ubuntu 22.04 服务器 MySQL 3306远程连接失败排查与解决

MiTS与PoTS：面向连续值时间序列的极简Transformer架构

SigLIP 2：多语言语义理解、定位和密集特征的视觉语言编码器

基于Transformer架构的时间序列数据去噪技术研究

书籍-《机器学习：从经典方法到深度网络、Transformer和扩散模型（第三版）》

FANformer：融合傅里叶分析网络的大语言模型基础架构

Openbayes 教程上新丨多主体驱动生成能力达SOTA，字节UNO模型可处理多种图像生成任务

利用phpy实现 PHP 编写 Vision Transformer (ViT) 模型

背景

什么是 phpy？

PHP 实现 Vision Transformer 的代码

PHP 代码分析与 Python 对比

phpy 的应用场景与意义

结语

汤青松

引用和评论

Ubuntu 22.04 服务器 MySQL 3306远程连接失败排查与解决

MiTS与PoTS：面向连续值时间序列的极简Transformer架构

SigLIP 2：多语言语义理解、定位和密集特征的视觉语言编码器

基于Transformer架构的时间序列数据去噪技术研究

书籍-《机器学习：从经典方法到深度网络、Transformer和扩散模型（第三版）》

FANformer：融合傅里叶分析网络的大语言模型基础架构

Openbayes 教程上新丨多主体驱动生成能力达SOTA，字节UNO模型可处理多种图像生成任务

什么是 `phpy`？

`phpy` 的应用场景与意义