By Ravi Malhotra May 3, 2022
The growth of high-definition video content for playback on larger, higher-resolution devices is driving the need for more efficient video codecs such as H.265. While bandwidth efficiency is twice that of the older H.264 codec, H.265 requires higher computational resources to provide this efficiency. Controlling costs (eg, bandwidth usage) is now the number one challenge mentioned by video developers [1], which makes H.265 attractive. But if the lower cost of bandwidth is replaced by the higher cost of computing and power, video developers are left standing. What they needed was a solution that could deliver H.265 efficiency at a fraction of the computational and power cost. This blog believes that the Arm Neoverse-based Ampere Altra Max server is exactly the solution video developers need to encode H.265 video streams.
Background introduction
The production and consumption of high-resolution video content has grown steadily over the past few years, thanks to better cameras and larger, higher-resolution devices. More advanced codecs such as H.265/HEVC, VP9 or AV1 are more than 50% more efficient at compressing higher resolution content than traditional codecs such as H.264. Recent market research shows that this growth translates into a significant increase in the use of these codecs, with H.265 leading the way.
Figure 1: Bitmovin 2021 report on video codecs used in production (2020 vs 2021)
The popularity of streaming services such as Netflix and Amazon Prime has also driven demand for high-resolution video content. Attracting and retaining customers will only increase that demand. Therefore, it is not surprising that video uploading and receiving (bandwidth requirements) and video transcoding and processing (computing requirements) account for the largest share of video processing platforms [2].
Figure 2: Video Processing Platform Market Share by Application, 2020
The improved compression of H.265 brings higher computational complexity, which may be an order of magnitude (10 times) higher than that of H.264. Although the use of cloud-based encoding is growing, most video encoding is still a preprocessing task [1]. Therefore, the increased computational requirements (capital cost) and power consumption (operating cost) of H.265 encoding is a challenge for most video developers. Therefore, it is important to encode on a server that is more performant and energy efficient.
The technical press has validated the performance and energy efficiency benefits of the Ampere Altra Max over traditional architectures on common benchmarks such as SPECrate® 2017 Integer [3]. With 128 Arm Neoverse N1 cores @3.0Ghz, Ampere Altra Max outperforms Intel Xeon "Ice-Lake" and AMD EPYC "Milan" CPUs, which have a much higher power consumption (TDP). In this blog, we show that these performance and power efficiency benefits of the Ampere Altra Max also extend to video encoding applications such as H.265.
To illustrate this, we encode H.265 and measure actual performance and power consumption when the system is fully loaded. We present some recent optimizations of the open-source libx265 encoder to use the Neon SIMD engine on the 64-bit Arm architecture. These optimizations lead to significant performance improvements of 1.5x to 2.2x [4].
Performance test results
We benchmarked the latest snapshot of the libx265 open source codec ( https://bitbucket.org/multicoreware/x265_git/ ) on similar Arm and x86-based servers. The x265 version on all systems is 3.5+20-17839cc0d. The configuration section shows system details for the Arm Neoverse N1 core based Ampere Altra Max server and the Intel "Ice-Lake" and AMD "Milan" architecture based x86 systems. The "Configuration" section lists the input video. We used various resolutions and encoding presets to see the performance impact in different scenarios.
Performance comparison – scaling to the full socket level
To test full socket performance, we launch as many H.265 encoding tasks as there are virtual cores in the system and measure cumulative frames per second (FPS). We run 128 tasks on Altra Max and AMD 7763 CPUs and 80 tasks on Xeon 8380 CPUs. We observed that the full socket performance of the Altra Max is 10% to 35% better than the AMD EPYC 7763 and more than 2x better than the Intel Xeon 8380 in various video resolutions and encoding presets.
Figure 3: Relative x265 performance between Ampere Altra Max, AMD EPYC, and Intel Xeon
It's worth noting the performance scaling differences between the SMT architecture-based x86 CPUs and the Altra Max's single-threaded core architecture. With Altra Max, performance scales linearly with the number of encoding tasks in the system. On AMD EPYC 7763 and Intel Xeon 8380, performance scaling is non-linear, and performance drops significantly once virtual cores are used.
Figure 4: x265 Performance Scaling by Jobs: Ampere Altra Max
Figure 5: x265 performance scaling by number of jobs: AMD EPYC 7763
Figure 6: x265 performance scaling by number of jobs: Intel Xeon 8380
Power consumption comparison – scaling to full socket level
The power efficiency of a platform is measured by the number of frames it encodes within a specific power budget. To measure this, we fully loaded a socket on all platforms with the maximum number of H.265 encoding tasks. Then measure its power consumption and calculate FPS per watt.
We found that the Altra Max is 40-70% more efficient on average than the AMD EPYC 7763 and 3x more efficient than the Intel Xeon 8380 at different video resolutions and encoding presets.
Figure 7: x265 relative performance per watt between Ampere Altra Max, AMD EPYC, and Intel Xeon
in conclusion
With the growth of high-resolution streaming, video streaming applications in the cloud require the use of higher compression codecs such as H.265. This compression brings higher computational cost and higher power consumption. At the system level, the Arm Neoverse-based Ampere Altra Max server offers better scalability and up to 2x higher performance, while being up to 3x more energy efficient for workloads than the Intel "Ice-Lake" server platform. Altra Max servers perform 35% higher than AMD "Milan" servers and are 70% more energy efficient for workloads. The recent x265 optimizations for the Arm architecture have ushered in a new era of power efficient encoding with exceptional performance, and we encourage readers to evaluate x265 video encoding for Ampere Altra and Altra Max systems.
Finally, we must realize that improving computational efficiency is not a video coding challenge, but a general processing challenge. New architectures such as Arm Neoverse and cloud-first CPU designs such as Ampere Altra Max help reduce the impact of computing on on-prem and cloud carbon emissions. For more information on the sustainability benefits of Neoverse and Ampere Altra Max, we encourage you to read our Earth Day 2022 blog ( https://www.arm.com/blogs/blueprint/earth-day-cloud ).
Input video file:
- https://storage.googleapis.com/ugc-dataset/original_videos/Sports/480P/Sports_480P-0623.mkv
- https://storage.googleapis.com/ugc-dataset/original_videos/Sports/720P/Sports_720P-00a1.mkv
- https://storage.googleapis.com/ugc-dataset/original_videos/Sports/1080P/Sports_1080P-0063.mkv
References:
- Bitmovin Video Developer Report 2021 https://go.bitmovin.com/video-developer-report
- Research and Markets Global Video Processing Platform Market report 2021
- https://www.anandtech.com/show/16979/the-ampere-altra-max-review-pushing-it-to-128-cores-per-socket/5
- Save on H.265 encoding using AWS Graviton2
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。