From July 31st to August 1st, the QCon Global Software Development Conference was held in Guangzhou for the first time. It discussed and shared the three key words of "digitalization", "cloud native" and "localization". In-depth analysis of hot topics such as intelligent frontier applications and Web 3.0 infrastructure. Follow [Rongyun Global Internet Communication Cloud] to learn more
The QCon Fusion Cloud Technology Session focused on the practice of audio and video architecture. Ren Jie, the vice chairman and chief scientist , served as the producer, Xie Xudan, manager of RTC service R&D center, Tian Runjun, audio and video R&D architect, and Sha Yongtao, audio algorithm engineer , respectively brought "RTC services". Design of Quality Observable Guarantee System" , "RTC Weak Network Countermeasure Technology Sharing" , "AI Noise Reduction Technology Exploration and Application" theme sharing.
Design of observable assurance system for RTC service quality
(Xie Xudan, Manager of Rongyun RTC Service R&D Center)
Real-time audio and video data is collected, pre-processed, encoded, and sent from the sending end, and the receiving end decodes, post-processes, and renders the data. This is a typical data processing process of RTC.
This process is arranged linearly, and the trouble caused by this is that once an error occurs in a certain link, the quality of all subsequent links will be affected, just like a "water pipe", if any part is blocked, the water flow will be blocked.
In the process of audio and video services, the most common problems are the following:
- There are many links that affect the quality, the scene is complex, and the problem location is difficult;
- Lack of evaluation methods and unified standards, it is impossible to measure the effect;
- Differences in the understanding of quality between developers and testers, resulting in high communication costs;
- Traditional testing has long testing process and low efficiency.
In response to these problems, there are some commonly used evaluation indicators in the industry, mainly in two categories: subjective indicators and objective indicators. The most representative of the subjective indicators is MOS. The advantages are high accuracy, and the disadvantages are high implementation costs, poor repeatability, and inability to evaluate in large quantities.
Therefore, we hope to replace manual operations with machines, and use mathematical models to quantify audio and video quality through some evaluation criteria. Its accuracy depends on the mathematical model of the evaluation, and it has high repeatability and can be evaluated in large batches.
There are two main types of typical objective evaluation methods: full reference and no reference.
No reference such as ambiguity, block effect, etc., the advantage is that only the data of the receiver is needed; the disadvantage is that the judgment is weak, and it cannot locate the internal and external problems of the system. A problem occurred during processing.
The full reference , such as PSNR, VMAF, etc., has the advantages of technically easy operation, can be repeated frequently, and can be reproduced accurately, which is convenient for quickly locating the problem; the disadvantage is that it requires data from both parties, and the original image and the target image must be strictly compared.
Specific to Rongyun's RTC service quality observable guarantee system design and related practice results, please pay attention to [Rongyun Global Internet Communication Cloud] background reply [Quality Inspection] to view the complete courseware.
RTC weak network confrontation technology sharing
(Rongyun Audio and Video R&D Architect Tian Runjun)
The tide of real-time interaction is coming. RTC real-time audio and video technology has developed rapidly, constantly punching in new applications and infiltrating new scenarios.
While advanced technology brings huge growth to online scenarios, it also faces higher and higher user experience requirements, lower latency, higher image quality, and smoother experience.
These three influencing factors of user experience correspond to the three core indicators of RTC, namely real-time, clarity, and fluency. Between the three, it is often impossible to have both.
In order to "have both", we usually need to pursue lower latency, higher definition and smoothness through network transmission optimization.
Weak networks are the main factors that affect user experience, such as congestion, packet loss, and delay jitter . Weak net adversarial technology is a general term for technical solutions to these problems and other network damage problems.
Due to the strong complexity and heterogeneity of the network environment, the severity of the above-mentioned weak network problems in different environments is also very different. How to ensure smooth communication between users in a complex network environment has always been a key issue in the RTC field.
For the corresponding solutions to these three problems and the best practices of Rongyun, follow [Rongyun Global Internet Communication Cloud] and reply to [Weak Network Confrontation] to obtain the complete courseware.
Exploration and application of AI noise reduction technology
(Sha Yongtao, Rongyun Audio Algorithm Engineer)
Noise reduction technology has been developed for many years, and there will be some typical algorithms and important technological breakthroughs at each stage. For example, the early linear filtering method and general subtraction method, and the later statistical model algorithm and subspace algorithm.
In recent years, the noise reduction algorithm based on deep learning has been rapidly developed, that is, the AI noise reduction algorithm. The main ones are deep learning algorithms based on amplitude spectrum , as well as deep learning algorithms based on complex spectrum , and later deep learning algorithms based on time domain signals .
Traditional algorithms are modeled by researchers summarizing noise laws, and then implementing background noise processing, including linear filtering, spectral subtraction, statistical model algorithms, and subspace algorithms. These algorithms are difficult to estimate and deal with non-stationary noise. Therefore, we need to introduce AI noise reduction to further improve noise reduction performance.
The AI noise reduction algorithm inputs the noisy speech into the trained neural network through feature extraction, and obtains the enhanced speech after denoising. Its essence is to use the neural network model to learn the characteristics and differences between speech and noise, so as to remove the noise and retain the speech.
AI noise reduction mainly studies three aspects.
The first is the model Model , which developed from the earliest DNN network to the later RNN network, and then to the later CNN network, GAN network and the recent Transformer, etc., with the development of deep learning models.
Then there is the training objective , which is generally divided into two categories: Mask class and Mapping class.
Finally, there is the loss function Loss Function.
So what are the main types of AI noise reduction? What are the results of the comparison experiment between traditional noise reduction and AI noise reduction? What is the specific practical effect of Rongyun in this regard? Follow [Rongyun Global Internet Communication Cloud] Background Reply [AI Noise Reduction] to get the complete courseware
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。