JND Perceptual Coding Technology of Narrowband HD for Live Streaming on Demand

Lead

On-demand live broadcasting has been closely related to daily life. What is the most important thing for everyone in this process, is the lower cost of broadcasting? Or a higher picture quality? This involves narrow-band high-definition technology, for video narrow-band high-definition technology, intelligent video coding is one of the most basic and most important part.

Cheng Ling | Senior Audio and Video Engine Development Engineer, NetEase Yunxin

01 Overview of Narrowband HD Technology

Narrowband HD technology is actually a set of video coding technology based on the best subjective perception of the human eye. It represents a video service concept with the most reasonable configuration of cost and experience and the best price-performance ratio. narrowband refers to saving unnecessary bits, high-definition allocates bits to places where more value can be generated, so as to achieve clearer and higher-quality picture quality under the same bandwidth conditions.

Under the influence of the epidemic, live broadcasting has penetrated from traditional shows to all fields. With the advent of the era of national live broadcasting, the demand for narrowband high-definition technology is increasing. This article will first introduce some of the more mature narrowband HD solutions in the industry, then share Netease Yunxin's exploration and practice in narrowband HD technology, and finally share its key technical points, JND perceptual coding technology.

02 Introduction to the industry's narrowband HD solutions

The industry already has relatively mature applications of narrowband high-definition technology, and some typical technical solutions will be introduced below.

Taobao Live

Taobao Live uses HEVC encoding to achieve 720p/25fps, 800kbps compression, and PSNR>43db/VMAF>90. There are three main applications of its video narrowband high-definition technology:

Audio and video enhancement, using AI-based image enhancement, beauty and voice enhancement to improve production quality
Perception processing, using source-channel joint adaptive coding, including ROI detection, setting different coding parameters according to scene classification, intelligent code control, etc.
S265 encoder, S265 encoder is the industry-leading HEVC encoder

Ali Narrowband HD

Ali's narrowband HD solution is based on the human visual model and adjusts the optimization goal of the encoder from the classic "highest fidelity" to "best subjective experience". With a unique algorithm, it weakens the areas that are easily overlooked by the human eye, strengthens the details that the human eye pays attention to, repairs the content that the human eye dislikes, and breaks the upper limit of the capabilities of contemporary video encoders. It can save the bit rate while providing clearer viewing. Experience.

Tencent Speed HD

Tencent Extreme HD uses video intelligence (videos are divided into dozens of categories, such as games, shows, sports, outdoor, animation, food, film and television dramas, and dozens of small categories), intelligent coding parameters (different scene configurations are different and optimal Encoding parameters), pre-processing (sharpening, soft blur, deblocking, noise reduction) and other technologies to solve transcoding distortion, low resolution blur, lens shake, large noise, low bit rate sawtooth block and other transcoding The problem is applied to Douyu, Penguin E-sports, CCTV, Xinying Sports, etc.

03 NE264 Narrowband HD Technology

NE264 is a video encoder developed by NetEase Yunxin that complies with the H.264 standard. It is currently used in RTC and live on-demand. For live broadcast on-demand, NE264 aims to achieve lower bandwidth and higher picture quality under the existing architecture, that is, NE264 narrowband high-definition. Below we will briefly introduce the video coding technology and the visual perception coding technology proposed according to the human visual characteristics. On this basis, we propose and implement the NE264 narrowband high-definition technology.

Video encoding

Video coding uses redundancy between data to compress. Early video coding relied on optimizing spatial redundancy, time domain redundancy, and frequency domain redundancy to improve compression efficiency. From MPEG-1 to MPEG-2, the code rate is reduced by about 50%, the coding efficiency is doubled, and the complexity is increased by about 5%.

H.264, launched in 2003, is a classic video compression protocol. After H.264 was launched, the optimization efficiency of traditional encoding methods became increasingly lower. From H.264 (AVC) to H.265 (HEVC), although the coding efficiency has increased by 40%, the complexity behind it has increased by 5 times. From H.265 to the latest H.266 (VVC) standard, The coding efficiency is less than 40%, but the complexity has increased by more than 10 times.

With the evolution of coding standards, the benefits are getting smaller and smaller. With the development of technology, technological breakthroughs become more and more difficult, so there is an urgent need for a new idea of coding and compression.

Human Visual System (HVS)

With the development of the human visual system (HVS) physiological and psychological research, we have found that the human brain actually has a lot of information redundancy when processing vision, and the use of human visual characteristics can significantly improve the efficiency of visual compression. It is the principle of human eye perception and compression.

The human visual system is composed of the eyeball, the nervous system and the visual center of the brain. When the human eye is looking at a video scene, the incident light is first adjusted and focused by the pupil and lens to image the scene on the retina, and then by the neurons on the retina. The light signal is converted into a neural signal and sent to the visual cortex. After further processing in the visual cortex and other areas of the brain, the perception of the video scene is formed.

In recent years, under the guidance of visual psychology and physiology, through the observation and research of certain visual phenomena of the human eye, people have discovered many characteristics of HVS. At present, in visual perception coding, the HVS features generally used include visual attention, visual concealment, visual sensitivity, visual statistical learning mechanism, etc. Some of the characteristics of HVS are as follows:

visual mask , human eyes are easier to perceive a single visual signal. When several visual signals exist at the same time, HVS will reduce or even disappear the perception of one or more signals, and the perception threshold will change, including:

Brightness masking: human eyes perceive lighter or darker areas weaker
Texture concealment: The visibility threshold of human eyes to non-uniform areas is significantly higher than that of uniform areas
Pattern concealment: The human eye's ability to distinguish regular objects is obviously higher than that of irregular objects
Motion concealment: The human eye's ability to distinguish between scenes of strenuous exercise will be significantly reduced

visual attention , that is, when the human eye pays attention to the video scene, the human eye will quickly focus on the video content or object of interest. There are two modes:

Bottom-up processing driven by external excitation . It is mainly related to the saliency of the image content. Targets that are quite different from the surrounding area tend to attract the observer's visual attention.
is a task-driven top-down process . Consciousness governs and depends on specific commands and is determined by human "cognitive factors", such as knowledge, expectations, and current goals. For example, the human body in a surveillance scene is more likely to attract attention.

Visual perception coding

visual perception coding is to use the known HVS characteristics to eliminate information that the human eye cannot perceive to the greatest extent, and to provide a better visual perception quality video image with fewer bit resources. To this end, researchers have proposed a large number of visual perception coding methods. According to the different HVS characteristics used by the coding method, the coding method based on visual masking the coding method based on visual attention are more researched and applied.

Regarding the coding method of visual concealment , the characteristics of the human eye multi-channel model, the existence of one stimulus will lead to the change of the detection threshold of the other stimulus, resulting in the decline or disappearance of the human eye's perception of one or more stimuli It is possible to eliminate visual redundancy. At present, coding methods based on visual concealment mainly include: coding methods based on JND model and coding methods based on subjective evaluation mechanisms such as SSIM, VMAF, etc. . Among them, the coding method of the JND model is a technology widely used in human visual coding, and it is also a technology that we focus on research.

Regarding the coding method of visual attention , according to whether the fovea characteristics of HVS are considered, coding methods based on visual attention can be divided into two categories, coding methods based on regions of interest and coding methods based on human eye saliency detection .

The basic idea of the coding method based on the region of interest (ROI) is to perform visual perception analysis on the input video scene to determine the region of interest before video coding. In the encoding process, by adjusting the encoding parameters, such as QP, to control the distortion degree of the region of interest and the non-interest region respectively, thereby improving the encoding quality of the region of interest. This technology has been proposed for many years, and its improvement is limited in actual use.
coding method based on human eye (Visual saliency detection) refers to extracting the salient area in the image (that is, the area of human interest) according to the human visual characteristics. When facing a scene, humans automatically process the regions of interest and selectively ignore the uninterested regions. These regions of interest are called saliency regions. This technology is a relatively common technology in human eye perception coding. It is usually combined with JND and other technologies to achieve a better compression effect, and it is also a technology that we want to study first.

NE264 technology

At present, the industry's narrowband HD technology is relatively mature. Combined with NE264 encoding characteristics and the goals we want to achieve, our narrowband HD technology is mainly divided into three parts:

video enhancement pre-processing technology: texture enhancement, improve subjective experience
saliency detection technology: distinguishes between saliency and non-saliency regions based on the characteristics of human visual attention, and is used for encoding to improve compression rate
JND perceptual coding technology: based on the concealing characteristics of human vision, which acts on coding and improves the compression rate

We can take a look at the specific process with the following figure: For the input video, we can analyze the video content characteristics through machine learning, and then perform video enhancement pre-processing to improve the image quality, and then perform saliency detection to distinguish between significant and non-saliency regions , Passed to the NE264 encoder, NE264 coded to calculate the JND coefficient, combined with the saliency detection result, acted on the coding, and finally output and display.

The following picture is a comparison effect picture of the pre-enhancement processing. The left picture is the original picture, and the right picture is the effect after the enhancement processing. It can be found that the subjectiveness of the image after the video enhancement processing has been significantly improved.

Video enhancement effect

The figure below is the saliency detection effect diagram, in which the upper color image is the original image, and the black and white image below is the saliency detection effect diagram, which is a value of 0-255. The brighter the more significant area.

Significance detection effect

04 JND Perceptual Coding Technology

Let's take a look at the key technology mentioned above: JND perceptual coding technology.

JND (Just Noticeable Distortion) is the smallest perceivable error. It is used to measure the sensitivity of the human eye to distortion in different areas of the image. It is mostly used for image/video coding based on visual characteristics, digital watermarking, and image quality evaluation. At present, a number of JND models have been proposed, which are mainly divided into two categories, pixel domain the JND model based on DCT domain .

The JND model based on the pixel domain can give the JND threshold of each pixel more intuitively in the pixel domain, without considering the frequency domain characteristics, the calculation is simple and convenient, but the accuracy is not high.
The JND model based on the DCT domain considers frequency domain characteristics and is more widely used. It usually includes three parts: Luminance Adaptation (LA), Contrast Masking (CM), and Contrast Sensitivity Function (CSF). We mainly use the JND perceptual coding technology based on the DCT domain. The JND calculation formula is as follows:

The NE264-based JND perceptual coding is as follows: For the input YUV image, we first calculate the brightness sensitivity, texture sensitivity and contrast sensitivity to obtain the JND coefficients, and then act on the DCT domain, change the original DCT coefficients, and then encode , Output code stream.

05 Summary

This article mainly introduces NE264 narrowband HD technology and JND perceptual coding technology. For live on-demand applications, how to reduce bandwidth as much as possible on the basis of ensuring high-definition picture quality is always the goal pursued, and video coding is a crucial link. , Whether it is traditional coding technology or combined with intelligent coding technology, we will continue to work hard to bring a high-quality video experience with lower latency and higher picture quality.

The above is all the content shared this time, click [ here ] to view the video review shared this time.

about the author

Cheng Ling, a senior audio and video algorithm engineer at NetEase Yunxin, is currently engaged in video coding algorithm research related work at NetEase Yunxin, and has relatively rich experience in video quality optimization and bit rate control algorithms.

For more technical dry goods, please pay attention to [Netease Smart Enterprise Technology+] WeChat public account

JND Perceptual Coding Technology of Narrowband HD for Live Streaming on Demand

Lead

01 Overview of Narrowband HD Technology