Create your own anime video with an AI model that surprised Makoto Shinkai

This article will introduce how to use the GAN model to generate your own anime-style videos, and generate a unique anime-style video for yourself, your favorite mushrooms, or naughty cute kids.

This article is less difficult to operate and is suitable for students who want to try and understand the GAN model. It can be done using both CPU/GPU (including ARM M1).

write in front

I have been reading and learning some model-related content for a while. When I saw Makoto Shinkai, the director of the famous classic works such as "Your Name", "Five Centimeters Per Second", "Weathering With You", because of a set of comparison charts, I posted the following When I sighed, my curiosity suddenly came up.

让新海诚本人惊讶并转发的模型效果

The translation of Makoto Shinkai's exclamation in the picture probably means: "It's very interesting, I feel all possibilities. If this is proposed by the art staff, the movie will be remade. Lol"

In my opinion, if you can get a big guy like Xin Haicheng to flip the brand, maybe this kind of model is worth playing. So, I tried to use this model to style my wedding anniversary video, and found that the effect was ok.

一些经过模型处理过的视频镜头

Then, I tried this "filter" on some old photos, and found that for some photos, the effect was really good.

两头毛孩子进食

一大把香蕉

常见的街道绿植

Lele alone is not as good as everyone. If you also have plans to make some cartoon-styled videos/photos for your loved ones, or children, or even yourself, then follow me to toss the content of this article.

This article will introduce the use of the two models in turn. A brief introduction to these two models is mentioned at the end of this article. If you are interested, you can jump to read it.

Alright, let's do some prep work first.

Preparation

Before we start "adding filters" to our videos or photos, we need to prepare the environment first.

In order to make it simpler, this article will not expand how to encapsulate the Docker image of the model in the article. Interested students can read the content of the previous article "Using Docker to Run HuggingFace Massive Models" to learn the relevant content.

Simplify Python program environment preparation with Conda

Like the previous article, I recommend using Conda to complete the basic program runtime installation.

让新海诚本人惊讶并转发的模型

You can download the appropriate installer from the official Conda website (the installation package is relatively large, about 500M, which requires some patience).

If you are a Mac x86 user, you can download Anaconda3-2022.05-MacOSX-x86\_64.sh
If you are a Mac M1 user, you can download Anaconda3-2022.05-MacOSX-arm64.sh
If you are a Linux user, you can download Anaconda3-2022.05-Linux-x86\_64.sh
If you are a Windows user, you can download Anaconda3-2022.05-Windows-x86\_64.exe

Next, we first take Mac and Ubuntu as examples to demonstrate how to complete the preparation of the environment.

After downloading the Conda installation file from the above address, you can complete the program installation with one line of command.

 # 先进行 conda 的安装
bash Anaconda3-2022.05.（你的安装文件名称）.sh

If you are using Ubuntu, like me, and need to test different models and projects frequently, you can execute the following command after installation to make the conda shell environment permanent.

 # 在完成安装之后，可以考虑hi
eval "$(~/anaconda3/bin/conda shell.bash hook)"

As for Mac, I prefer when to use and when to manually activate conda shell. For example, after installing conda and initializing the dedicated environment for a program, we can activate the shell required by this environment by executing the conda activate [环境名称] command. (For how to initialize the environment, please read down patiently)

For domestic users, it is recommended to configure the software source first when using Conda. This reduces unnecessary time waste in the process of downloading packages.

Edit the vi ~/.condarc file and add the following content (take "Tsinghuayuan" as an example):

 channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
  - defaults
show_channel_urls: true

When we have finished modifying the contents of ~/.condarc , restart the Shell first, and then use conda info to check whether the software source is configured successfully:

 (base) soulteary@ubuntu:~# conda info

     active environment : base
...
           channel URLs : https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/linux-64
                          https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/noarch
                          https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/linux-64
                          https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/noarch
                          https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/linux-64
                          https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
...

As you can see, the log output contains the "Tsinghua source" we just filled in.

Next, let's use the conda command to quickly create the running environment required by the model application we need:

 conda create -n my-anime-video python=3.10

After the above command is executed, a basic Python 3.10-based runtime environment named my-anime-video will be created; if you need other versions of Python to run your model, you can adjust the version number in the command.

After the environment is created, we need to execute the command first to activate the environment (the students who have used GVM and NVM should be familiar with it).

 conda activate my-anime-video

When the environment is activated, the prompt for the environment name will appear at the beginning of our shell command prompt, telling us that we are now under this environment:

 (my-anime-video) # your shell command here...

When the activation of the environment is completed, we also complete the switching of the software source first (the previous article has a detailed introduction, you can read it yourself if you are interested):

 pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

Next, complete the installation of PyTorch's common dependencies:

 pip install torch torchvision torchaudio

Finally install a tool for "lazy" Towhee :

 pip install towhee

As for how much trouble this tool can save, I won't mention it first, and you'll know after you see it.

Install ffmpeg to process multimedia material

If you just want to process pictures, then this part can be skipped, if you want to process content including video, we need to use the power of this tool ffmpeg .

In macOS, we can install it by brew :

 brew install ffmpeg

In a Linux environment, such as Ubuntu, we can install it by apt :

 apt install ffmpeg -y

After ffmpeg is installed, let's process the video and convert the video into a picture to be processed.

For example, if we want to disassemble the video in the current directory wedding-video.mp4 into pictures at 25 frames per second, and save the pictures in the current directory images directory, we can execute the following command:

 ffmpeg -i ./wedding-video.mp4 -vf fps=25 images/video%d.png

When the command is executed, we will get a folder full of pictures, all of them will be prefixed with video in the command, and the number will end. The video file I chose was close to 15 minutes, and when the conversion was done, I got over 20,000 pictures.

 ls images/| wc -l
21628

When the above preparations are in place, let's take a look at how to use the two models to generate our own cartoon/anime style videos.

Let's take a look at the first model: CartoonGAN .

CartoonGAN

Regarding this model, I found an online demo address on HuggingFace: https://huggingface.co/spaces/akiyamasho/AnimeBackgroundGAN .

This online tool comes from a Japanese developer, and in his other project AnimeBackgroundGAN-Shinkai (Makoto Shinkai style), we can find a pre-trained model (there are also styles of Hayao Miyazaki, Mamoru Hosoda, Toshiyoshi Kon in the related project ), and in GitHub, we can find the code repository venture-anime/anime-background-gan-hf-space corresponding to this project.

Before processing the video material (a lot of pictures) mentioned above, let's try to run a web tool of the same type locally to verify whether the model code can run correctly.

Verify project model effect

This project can be run using CPU or GPU, if you have macOS, Ubuntu, you can run it directly.

However, the original project has some problems with the latest version of PyTorch support. It will report an error when running with GPU, and does not support running on a specified graphics card, and there are some problems with project dependencies, so I made a fork version: https://github.com/soulteary /anime-background-gan-hf-space .

We use Git to download the project, then switch to the project directory:

 git clone https://github.com/soulteary/anime-background-gan-hf-space.git
cd anime-background-gan-hf-space

Before proceeding further, we need to confirm that the previously prepared Python environment has been activated with conda. If you are not sure or have not switched yet, you can execute the following command again (repeated execution has no side effects):

 conda activate my-anime-video

After switching to the my-anime-video environment, we start the project using Python:

 python app.py

After the command is executed (it may take a while), we will get a log similar to the following:

 /Users/soulteary/anaconda3/envs/my-anime-video/lib/python3.10/site-packages/gradio/deprecation.py:40: UserWarning: `optional` parameter is deprecated, and it has no effect
  warnings.warn(value)
Running on local URL:  http://127.0.0.1:7860/

To create a public link, set `share=True` in `launch()`.

Then open the browser, enter the address indicated in the log above http://127.0.0.1:7860/ , and you will be able to see an online web tool interface.

Hugging Face 同款网页应用界面

Model and application code verification is also very simple, click the image upload area on the left, upload an image you want to test, and then click "Submit".

以星爷帅照进行测试

As you can see, a cartoon-like picture appears on the right. Here, if the CPU is used to run, it generally takes 3 to 10s to process a picture, and if the GPU is used, it generally takes about 1s.

After completing the verification of the program and model, let's write a program to process images in batches.

Write a model calling program for batch image processing

The complete program, I have uploaded it to GitHub https://github.com/soulteary/have-fun-with-AnimeGAN/blob/main/CartoonGAN/app.py for everyone to pick up.

In the original project, batch reading of images is not supported, and four models are loaded by default, which is a waste of resources, so I have made some functional improvements here.

 import argparse
import glob, os
import time
from pathlib import Path
from PIL import Image

import torch
import numpy as np
import torchvision.transforms as transforms
from torch.autograd import Variable
from network.Transformer import Transformer
from huggingface_hub import hf_hub_download

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def parse_args():
    desc = "CartoonGAN CLI by soulteary"
    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument('--model', type=str, default='Shinkai', help='Shinkai / Hosoda / Miyazaki / Kon')
    parser.add_argument('--input', type=str, default='./images', help='images directory')
    parser.add_argument('--output', type=str, default='./result/', help='output path')
    parser.add_argument('--resize', type=int, default=0,
                        help='Do you need a program to adjust the image size?')
    parser.add_argument('--maxsize', type=int, default=0,
                        help='your desired image output size')
    """
    If you want to resize, you need to specify both --resize and --maxsize
    """
    return parser.parse_args()

def prepare_dirs(path):
    Path(path).mkdir(parents=True, exist_ok=True)


arg = parse_args()


enable_gpu = torch.cuda.is_available()

if enable_gpu:
    # If you have multiple cards,
    # you can assign to a specific card, eg: "cuda:0"("cuda") or "cuda:1"
    # Use the first card by default: "cuda"
    device = torch.device("cuda")
else:
    device = "cpu"

def get_model(style):
    # Makoto Shinkai
    if style == "Shinkai":
        MODEL_REPO_SHINKAI = "akiyamasho/AnimeBackgroundGAN-Shinkai"
        MODEL_FILE_SHINKAI = "shinkai_makoto.pth"
        model_hfhub = hf_hub_download(repo_id=MODEL_REPO_SHINKAI, filename=MODEL_FILE_SHINKAI)
    # Mamoru Hosoda
    elif style == "Hosoda":
        MODEL_REPO_HOSODA = "akiyamasho/AnimeBackgroundGAN-Hosoda"
        MODEL_FILE_HOSODA = "hosoda_mamoru.pth"
        model_hfhub = hf_hub_download(repo_id=MODEL_REPO_HOSODA, filename=MODEL_FILE_HOSODA)
    # Hayao Miyazaki
    elif style == "Miyazaki":
        MODEL_REPO_MIYAZAKI = "akiyamasho/AnimeBackgroundGAN-Miyazaki"
        MODEL_FILE_MIYAZAKI = "miyazaki_hayao.pth"
        model_hfhub = hf_hub_download(repo_id=MODEL_REPO_MIYAZAKI, filename=MODEL_FILE_MIYAZAKI)
    # Satoshi Kon
    elif style == "Kon":
        MODEL_REPO_KON = "akiyamasho/AnimeBackgroundGAN-Kon"
        MODEL_FILE_KON = "kon_satoshi.pth"
        model_hfhub = hf_hub_download(repo_id=MODEL_REPO_KON, filename=MODEL_FILE_KON)

    model = Transformer()
    model.load_state_dict(torch.load(model_hfhub, device))
    if enable_gpu:
        model = model.to(device)
    model.eval()
    return model

def inference(img, model):
    # load image
    input_image = img.convert("RGB")
    input_image = np.asarray(input_image)
    # RGB -> BGR
    input_image = input_image[:, :, [2, 1, 0]]
    input_image = transforms.ToTensor()(input_image).unsqueeze(0)
    # preprocess, (-1, 1)
    input_image = -1 + 2 * input_image

    if enable_gpu:
        logger.info(f"CUDA found. Using GPU.")
        # Allows to specify a card for calculation
        input_image = Variable(input_image).to(device)
    else:
        logger.info(f"CUDA not found. Using CPU.")
        input_image = Variable(input_image).float()

    # forward
    output_image = model(input_image)
    output_image = output_image[0]
    # BGR -> RGB
    output_image = output_image[[2, 1, 0], :, :]
    output_image = output_image.data.cpu().float() * 0.5 + 0.5

    return transforms.ToPILImage()(output_image)


prepare_dirs(arg.output)

model = get_model(arg.model)

enable_resize = False
max_dimensions = -1
if arg.maxsize > 0:
    max_dimensions = arg.maxsize
    if arg.resize :
        enable_resize = True

globPattern = arg.input + "/*.png"

for filePath in glob.glob(globPattern):
    basename = os.path.basename(filePath)
    with Image.open(filePath) as img:
        if(enable_resize):
            img.thumbnail((max_dimensions, max_dimensions), Image.Resampling.LANCZOS)

        start_time = time.time()
        inference(img, model).save(arg.output + "/" + basename, "PNG")
        print("--- %s seconds ---" % (time.time() - start_time))

The above 100-line program probably does a few things. It will dynamically load the necessary models according to the passed parameters, not all models. By default, it will read all the pictures in the current directory images subdirectory, and call CPU/GPU in turn to process the pictures, and save the processing results in the result subdirectory . If you want to resize the output image, you can pass a parameter to limit it.

Let's try to run this program:

 python app.py --model=Shinkai

When the command starts executing, we can batch process the video images. If nothing else, you will see a log similar to the following:

 ...
--- 1.5597078800201416 seconds ---
INFO:__main__:Image Height: 1080, Image Width: 1920
--- 0.44031572341918945 seconds ---
INFO:__main__:Image Height: 1080, Image Width: 1920
--- 1.5004260540008545 seconds ---
INFO:__main__:Image Height: 1080, Image Width: 1920
--- 1.510758876800537 seconds ---
INFO:__main__:Image Height: 1080, Image Width: 1920
--- 1.362170696258545 seconds ---
INFO:__main__:Image Height: 1080, Image Width: 1920
...

It may take a long time to process here. If we have a GPU, the speed will be greatly improved, or the size of the output image can be adjusted appropriately, and the processing speed can also be obtained when the CPU is used for rendering.

After the program runs, we can get the image content processed by the AI model in the result directory, then we just need to use ffmpeg to convert the image into a video again. La:

 ffmpeg -f image2 -r 25 -i result/video%d.jpg -vcodec libx264 -crf 18  -pix_fmt yuv420p result.mp4

Next, let's talk about how to use GPU to speed up, and then see if there is a more general, low-cost speed-up solution.

Speed up model execution with GPU

If you don't have a GPU and want to play with minimal cost, you can go straight to the next subsection.

Closer to home, if we want to process tens of thousands or even 100,000 images, the best way is to use GPU for data processing. Considering that the current graphics card with large video memory that can efficiently run AI models starts at at least 10,000, I recommend using a pay-as-you-go cloud host, which costs about 10 to 20 yuan per hour, and you can get it after running for a few hours. The result you want. I personally tested it, and several cloud manufacturers have similar hosts with graphics cards. You can choose according to your own situation and your preferred method . If you are a student, some platforms may even offer education discounts.

In order to get the results relatively quickly, I chose a dual-card cloud host, which can directly halve the data processing time without deep optimization of the program.

To make the two graphics cards process half of the data, we can choose to modify the above program, or use the Linux Shell to directly divide the data into two piles. Being lazy, we choose the latter. (If you only have one card, you can skip this step)

Let's first confirm the total amount of data to be processed, there are more than 20,000 pictures.

 ls images/| wc -l
21628

Then use the combine command to move half of the images to the newly created folder:

 mkdir images2
ls | sort | head -10814 | xargs -I {} mv {} images2/

After preparing the data, we also need to make a little adjustment to the program to let the program use two different graphics card devices. We can choose the same way as the program reads the input and output folders and uses the parameters to let the user specify the graphics card. device; you can also choose a simpler way, copy the above program, adjust the original program and the name of the graphics card device to be called in the new program, for example, adjust device = torch.device("cuda") to device = torch.device("cuda:0") device = torch.device("cuda:1") .

After completing the above adjustment and preparation, use python to execute two programs: python app1.py , python app2.py . If you can't guarantee that your session to the server via SSH is stable, you can try tools like screen , tmux .

After the program starts running, the next step is a long wait. During the processing, we can use nvidia-smi to check the status of the graphics card, and we can see that if this "computer" is left at home, it should be quite expensive. And it will most likely be noisy.

 nvidia-smi 

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:08.0 Off |                    0 |
| N/A   54C    P0   263W / 300W |  25365MiB / 32510MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:00:09.0 Off |                    0 |
| N/A   55C    P0   265W / 300W |  25365MiB / 32510MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     61933      C   python                          25363MiB |
|    1   N/A  N/A     62081      C   python                          25363MiB |
+-----------------------------------------------------------------------------+

When the program has finished processing all the pictures, it will automatically exit. At this time, we can use ffmpeg again to restore the picture to a video:

 time ffmpeg -f image2 -r 25 -i result/video%d.png -vcodec libx264 -crf 18  -pix_fmt yuv420p result.mp4

When the program finishes running, we will see a log similar to the following:

 Output #0, mp4, to 'result.mp4':
  Metadata:
    encoder         : Lavf58.29.100
    Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), yuv420p, 1920x1080, q=-1--1, 25 fps, 12800 tbn, 25 tbc
    Metadata:
      encoder         : Lavc58.54.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
frame=21628 fps= 75 q=-1.0 Lsize= 1135449kB time=00:14:25.00 bitrate=10753.3kbits/s speed=2.99x    
video:1135185kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.023257%
[libx264 @ 0x55779059b380] frame I:253   Avg QP:15.48  size:194386
[libx264 @ 0x55779059b380] frame P:6147  Avg QP:17.74  size: 95962
[libx264 @ 0x55779059b380] frame B:15228 Avg QP:20.43  size: 34369
[libx264 @ 0x55779059b380] consecutive B-frames:  4.7%  2.9%  4.0% 88.4%
[libx264 @ 0x55779059b380] mb I  I16..4: 16.0% 47.1% 36.9%
[libx264 @ 0x55779059b380] mb P  I16..4:  7.9% 13.9%  6.1%  P16..4: 31.2% 18.5%  9.3%  0.0%  0.0%    skip:13.0%
[libx264 @ 0x55779059b380] mb B  I16..4:  1.6%  1.6%  0.5%  B16..8: 34.3%  9.6%  2.7%  direct:11.8%  skip:37.9%  L0:42.2% L1:42.3% BI:15.5%
[libx264 @ 0x55779059b380] 8x8 transform intra:48.1% inter:59.5%
[libx264 @ 0x55779059b380] coded y,uvDC,uvAC intra: 47.2% 66.3% 35.9% inter: 30.7% 35.6% 1.6%
[libx264 @ 0x55779059b380] i16 v,h,dc,p: 39% 32%  6% 23%
[libx264 @ 0x55779059b380] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 20% 26%  4%  5%  4%  5%  4%  5%
[libx264 @ 0x55779059b380] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 33% 20% 16%  5%  6%  6%  5%  5%  4%
[libx264 @ 0x55779059b380] i8c dc,h,v,p: 50% 21% 21%  8%
[libx264 @ 0x55779059b380] Weighted P-Frames: Y:24.2% UV:17.1%
[libx264 @ 0x55779059b380] ref P L0: 49.1% 14.5% 22.1% 12.4%  1.9%
[libx264 @ 0x55779059b380] ref B L0: 81.3% 14.0%  4.7%
[libx264 @ 0x55779059b380] ref B L1: 93.6%  6.4%
[libx264 @ 0x55779059b380] kb/s:10749.30

real    4m49.667s
user    78m36.497s
sys    0m44.806s

When I open the processed video, in terms of personal perception, the model performs well in a well-lit scene. Due to the compression of uploaded images, the effect is not as stunning as seen locally. Interested students may wish to try it.

部分视频中的画面截图

Maybe you want to know, is there any solution that can perform some acceleration effects without GPU? The answer is yes.

Use parallel computing and streaming to speed up image processing

Python 3 supports concurrent API for parallel computing. However, dealing with concurrent computing generally requires tossing the queue, involving multi-threaded scheduling, involving streaming IO and so on. And concurrent code debugging is also disgusting, especially in Python... (personal opinion)

Do you still remember the pip install of towhee in the above, this package contains some quick tools and methods, which can simplify the above series of troublesome things, let us deal with the same things , spend less money, because many times, some basic and complex troubles should be solved by basic tools, rather than all filled by developers.

Let's take the model application code of more than 100 lines in the above example. If you use Towhee to make a "replacement" program, you can do it like this:

 import towhee

towhee.glob('./*.png') \
    .image_decode() \
    .img2img_translation.cartoongan(model_name = 'Shikai') \
    .save_image(dir='./result')
    .to_list()

A few lines of code complete the core logic: batch read pictures, convert the pictures to RGB encoding, then pass the picture data to the model for processing, and finally save the model calculation results to a certain folder.

Next, we make some adjustments to the above code to complete the troubles mentioned above, parallel computing. In fact, few changes are required. We only need to add a line .set_parallel(5) before "processing image encoding conversion" to tell the program that we need to use multiple threads to do things.

When the program runs, it will call 5 concurrency by default to perform the calculation, so even if there is no GPU and only the CPU is used, we can obtain higher execution efficiency.

 import towhee

towhee.glob('./*.png') \
    .set_parallel(5) \
    .image_decode() \
    .img2img_translation.cartoongan(model_name = 'Shikai') \
    .save_image(dir='./result')
    .to_list()

In the process of playing Towhee, I also encountered some small problems. For example, Towhee is a tool designed for scientific computing, so saving pictures and tossing video files is not its main business. By default, it will save processed pictures in the form of UUID.

In order to make the image results available to us, the output image can be stored according to the original file name, so that the result processed by the model can be "restored to a video". I specifically submitted a PR ( \#1293 ) to its open source repo to address this issue.

Of course, we also need to make some additional adjustments to the above lines of code and add some parameters:

 import towhee

towhee.glob['path']("./*.png") \
    .set_parallel(5) \
    .image_decode['path','img']() \
    .img2img_translation.cartoongan['img', 'img_new'](model_name = "Shikai") \
    .save_image[('img_new','path'), 'new_path'](dir="./result") \
    .to_list()

However, as of the time of publication of this article, the trunk version containing my PR has not been officially released, so for the time being, it is necessary to install the Python PyPI daily-build development version package to use this feature.

 pip install -i https://test.pypi.org/simple/ towhee==0.6.2.dev48

If you think this lazy gameplay is not bad, you can go to the official open source repository for me to raise the Issue and urge the maintenance team to update the version.

So, why don't I, as a Towhee user, urge me? Actually it's because of embarrassment. Why are you embarrassed? You'll know when you read on.

AnimeGAN

After talking about CartoonGAN, let's try another model: AnimeGAN. There are currently three versions of the model, except for the third version, which are all free-to-use open source projects.

For the convenience of demonstration and some special reasons (detailed at the end of the article), I choose the second version, which is open source and relatively stable.

Before you start tossing around, you can also try its online demo: https://huggingface.co/spaces/akhaliq/AnimeGANv2 .

Or first perform local project verification to verify that the model code can run correctly.

Verify project model effect

Considering the length of the article and the "virtue" of engineers (lazy), we will no longer toss the lengthy model calling program code, and I choose to continue to use the Python version of "jQuery" to "Write Less, Do More":

 import towhee

arg = parse_args()
towhee.glob['path']("./*.png") \
    .set_parallel(5) \
    .image_decode['path', 'img']() \
    .img2img_translation.animegan['img', 'new_img'](model_name ="Shinkai") \
    .save_image[('new_img', 'path'), 'new_path'](dir="./result", format="png") \
    .to_list()

The logic is the same as calling CartoonGAN above, the only difference is that the model animegan is used.

We save the above code as lazy.py , then find some pictures from the Internet and put them in the directory where the program is located, execute python lazy.py , after a while, you can find one in the same level directory of the program The new folder named result contains the pictures we processed with the model.

I have uploaded the relevant code to https://github.com/soulteary/have-fun-with-AnimeGAN/blob/main/AnimeGAN/lazy.py Interested students can pick it up by themselves.

After verifying the model effect, we can also use this model to generate a cartoon-style video.

Although the efficiency of using ffmpeg to stitch the video above is very high, but after all, there are two extra commands to be executed. Is there any way to be lazy? The answer is obviously yes.

Write a model caller for video processing

I don't know if there are any students who remember, in the last article I mentioned one of the core developers of Towhee, @houjie. Under my stalking , he helped me, a Python rookie, to create a read_video method.

Therefore, in the above, you need to use ffmpeg to convert the video into a picture, and then after the model is processed, the tedious operation of piecing the picture into a video can be replaced by a few lines of code! Compared with the previous gameplay, the number of new lines of code is instantly shortened to about five lines (six lines even if empty lines are counted), saving 80% of the original lines of code that need to be written.

 import towhee

towhee.read_video("./video.mp4") \
    .set_parallel(5) \
    .img2img_translation.animegan['img', 'new_img'](model_name = 'Skinkai') \
    .to_video('./result.mp4', 'x264', 15)

The above lines of code are executed, and the model will be automatically applied to process each frame of the video, and then the model processing result will be re-encoded into a video file of 15 frames per second. In actual use, you can adjust the bit rate according to your needs.

In order to facilitate your use, I also uploaded the code to GitHub , and students who need it can pick it up by themselves.

Quick preview for video files

Compared with image content processing, the amount of data for videos of the same resolution will actually be much larger.

So if we want to have a quick effect preview of the video, we can consider adding a size adjustment function to the code to scale the image size that needs to be processed in each frame to reduce the amount of calculation.

Just like the lazy principle above, the way to get this "requirement" is very simple, just add a line image_resize in the right place:

 import towhee

towhee.read_video("./video.mp4") \
    .set_parallel(5) \
    .image_resize(fx=0.2, fy=0.2) \
    .img2img_translation.animegan['img', 'new_img'](model_name = 'Skinkai') \
    .to_video('./result.mp4', 'x264', 15)

This part of the code, I still uploaded to GitHub , I hope to help you.

other

Okay! So far, I have finished:

How to prepare a quick-start Python environment
How to quickly get started with CartoonGAN and AnimeGAN;
How to use these two models to process pictures or videos;
How to use "jQuery" in the Python model: Towhee to be lazy, write less code and do more work.
How to do basic tuning and packaging of the "wild" model project on GitHub

Next, let's briefly talk about the model mentioned at the beginning of the article.

About AnimeGAN and CartoonGAN

Regarding the use of GAN (Generation Adversarial Network) model to stylize pictures for animation and cartoon, as far as China is concerned, there are currently two well-known domestic projects.

One is the CartoonGAN published by the Tsinghua University team in 2018. The paper of this model selected CVPR 2018 and has more than 700 stars on GitHub. The other is AnimeGAN from Hubei University of Science and Technology, which was released in 2019. The model has iterated three versions so far (the first two versions are open source), and has accumulated nearly 8,000 stars on GitHub.

Regarding the two models and papers, the media have carried out reports and promotions: "Real photos change into Xin Haicheng style cartoons in seconds: Tsinghua University proposes CartoonGAN" , "Strongly Amway Try this! The cartoon with explosive effect turned into AI, and the fire hit the server several times." The results of the pictures processed by AnimeGAN even got the sigh of Makoto Shinkai.

I personally use it, CartoonGAN and AnimeGAN have their own advantages and disadvantages, as for the model effect, and which scenarios are suitable for each. I believe that smart readers can find the answer through the methods mentioned in this article and their own practice.

The open source warehouse addresses of these two projects:

At present, because AnimeGAN v3 is undergoing commercialization attempts and is a closed source release, in order not to affect the author, we will not do related model encapsulation and attempts here.

Open source is not easy, model projects are especially difficult to open source, and it is even more difficult to be able to do commercial transformation. It needs the support and encouragement of the community and compatriots. Only by continuously supporting and giving feedback to the open source ecology, can the domestic open source ecology change in a positive direction, and when the ecology improves, we practitioners will certainly be able to obtain more benefits from it.

at last

A week ago, I posted a commemorative video of using a model to deal with my previous marriage on the Moments. Many friends liked and expressed their curiosity about how to "toss with their own photos or videos". At that time, I promised that everyone would publish a tutorial. Dragon Boat Festival, I drove out this content, I hope you all have a good time.

Finally, thanks again to @hou Jie, who added new functions under my stalking, and also thanks for providing a lot of advice for my PR, helping me to pass the model to Towhee Hub, and solve the problem of slow download models in China. Cheng Yuan@roomyumeizhi from Caiyunzhinan.

--EOF

This article uses the "Signature 4.0 International (CC BY 4.0)" license agreement, welcome to reprint, or re-modify for use, but you need to indicate the source. Attribution 4.0 International (CC BY 4.0)

Author of this article: Su Yang

Creation time: June 4, 2022 Word count: 20370 words Reading time: 41 minutes Link to read this article: https://soulteary.io/2022/06/04/create-your-own-anime-video-with-an -ai-model-that-surprised-makoto-shinkai.html

Create your own anime video with an AI model that surprised Makoto Shinkai

write in front

Preparation

Simplify Python program environment preparation with Conda

Install ffmpeg to process multimedia material

CartoonGAN

Verify project model effect

Write a model calling program for batch image processing

Speed up model execution with GPU

Use parallel computing and streaming to speed up image processing

AnimeGAN

Verify project model effect

Write a model caller for video processing

Quick preview for video files

other

About AnimeGAN and CartoonGAN

at last

soulteary

引用和评论

AI Agent爆火后，MCP协议为什么如此重要！

如何减少跨团队交付摩擦？——基于 DevOps 与敏捷的最佳实践

Anaconda安装教程以及Anaconda和pip配置国内镜像

科学计算编程涉及到的技术栈简介

使用 chardet 判断文件编码需要注意的坑——过大的文件会导致高耗时

Python3 格式化时间（qbit）

本地使用PaddleOCR进行图片识别获得文字（返回JSON）