From the perspective of a test, I am seeking audio and video quality testing and some monitoring analysis methods based on the live broadcast and voice intercom services of the cameras in the current project. It is found in the work that after a certain degree of concurrency is reached, problems such as delay, freeze, frame loss, and mosaic will occur. So recently I have watched live broadcasts and columns of experts on the Internet to learn about good testing methods in the industry.
No, a few days ago, I had the honor to participate in the online live broadcast and discussion real-time voice quality monitoring system, the past, present and future of The purpose of this live broadcast is to introduce the progress of the real-time voice quality monitoring system of Shengwang, and to share with you the future evolution direction.
Putting it together, the main content of this live broadcast mainly covers the following modules:
1. In the past: speech quality evaluation algorithm
2. Now: Online offline testing
3. The future: integration of perception, feedback and monitoring
1. In the past: Speech quality evaluation algorithm
Among them, the past speech quality evaluation algorithms mainly introduced reference objective evaluation methods, non-reference objective evaluation methods, and subjective evaluation methods.
One thousand viewers will have one thousand Hamlet, so the subjective evaluation method will be ignored for the time being. The most widely used of the reference objective evaluation methods are P.862 PESQ and PESQ-WB. In about 12 years, the latest reference evaluation method P.863 POLQA was launched, which is an upgrade based on PSQM. They all rely mainly on lossless reference signals. The non-reference objective evaluation method does not require a reference signal. According to the author, ANIQUE+ is more accurate than the referenced PESQ, which is also very interesting.
Pain points of objective evaluation methods:
1. There is a reference method: can only be used before going online
2. No reference method-traditional signal domain: narrow application scenarios and poor robustness
3. No reference method-traditional parameter domain: accuracy can be maintained only under limited weak network conditions
4. No reference method-deep learning: limited application scenarios and corpus, high complexity (signal domain)
In terms of speech quality evaluation algorithms, we are really noobs. Based on the current business, the main coverage is still functional testing, interface testing and partial performance testing of streaming media. Using existing algorithms to evaluate voice quality may not be done for the time being.
2. Now: Online offline testing
In the live broadcast, Mr. Zhao Xiaohan mainly reviewed the goals before designing this system, and the main problems and solutions of the current uplink and downlink.
The design goals of the existing evaluation system:
1. High precision: reliable evaluation results
2. Wide coverage of business scenarios: gaming, entertainment, education and other business scenarios
3. The algorithm complexity should not be too high: it will not greatly reduce the performance
4. The ability to be weakly related to voice content: No matter the input is voice, music or noise, the analysis result cannot be affected.
There are mainly these processes in the downlink: encoding, transmission, decoding, and playback
The quality evaluation method on the downstream side is also mainly developed based on the above four modules:
1. Codec performance: different codecs have different processing results for different corpora
2. Network transmission: packet loss, jitter and delay, etc.
3. Weak network countermeasure algorithm quality: compensation for frame loss
4. The ability of the equipment to put out: poor equipment hardware will damage the sound quality
The content of this part is very touching. The cameras we are currently using come from several manufacturers such as Haikang, Dahua, Xiongmai, TPLink, and so on. The same manufacturer has multiple models. Different devices have hardware differences, even the basic national standard access will be a little abnormal, let alone the performance on audio and video. At present, the video encoding used by our platform is changing from H264 to H265, and audio and video quality testing is extremely important.
The network transmission is also a bottleneck frequently encountered in our current performance tests, especially the upload of video files to s3 storage will be largely limited by the upstream bandwidth. In addition, using udp transmission will inevitably cause data packet loss and other problems.
Different terminal devices have different sound quality for external audio playback. We have discovered this during the compatibility test.
3. The future: integration of perception, feedback and monitoring
Goals for the future system:
1. The internal state is more detailed: the details of the uplink need to be optimized.
2. The experience coverage is wider: some noises have not been covered yet and need to be optimized.
3. Faster feedback: the target can receive feedback within 1 minute.
4. More comprehensive call coverage: The goal is to monitor every second.
A platform with wide coverage, fast response, and accuracy will be the goal of all platforms. I hope that the platform will bring greater promotion to the audio and video quality inspection industry as soon as possible.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。