头图

foreword

The development of audio technology to this day has gone through the process from analog audio to digital audio. The International Society of Audio Engineers was founded in 1948. China's digital audio technology started relatively late, and has been monopolized by foreign organizations and companies for a long time. With the continuous development of China and the advancement of science and technology, after nearly 30 years of hard work, we have accumulated a lot of our own audio technology experts and scholars in speech recognition reconstruction, sound field reproduction, digital audio communication, etc. technological gap with the West.

In order to better promote industry exchanges, strengthen China's power in the audio field, and popularize the public's understanding and cognition of audio technology, we have launched a "one session, one meeting " event for experts in the audio field, hoping to create more audio technology practitioners. A platform for full communication, we are delighted to see many meaningful and valuable sparks of thought in the event, and we hope that "one session and one session" can carry and ignite the dreams of more audio technicians.

—— Sound Network Technology Partner, Audio Codec Expert

@高泽华

This article is based on the content discussed in the "One Session • Audio Engineer Session", and the opinions are for reference only.

图片

Some perspectives ahead:

● In the audio field, AI solves general problems and algorithms solve individual problems, so the combination of AI and traditional algorithms is inevitable.

● The issue of "hot spots" may be of little significance to developers, and the benefit distribution mechanism of most organizational structures is often unfavorable to front-line developers.

● A real problem faced by spatial audio is that there are not enough audio source materials to apply, and the devices and business scenarios with more market share do not have the corresponding conditions.

● From the technical point of view, there are many things that VoIP can do, but from the application scenario, the market demand is the decisive factor.

1. The development of audio technology has entered a platform period

The development of audio technology has reached a plateau stage. The upsurge of neural networks has begun to subside, and breakthroughs have been made in audio noise reduction, echo cancellation, packet loss compensation, etc. The improvement of channel technology is approaching the bottleneck, and the application of codec and microphone arrays is still in the trial stage.

In the long run, the accumulation of business needs and the iteration of audio technology have been driving the development of the field. The problem now is that new breakthroughs are required at the technical level . In specific scenarios, such as virtual meetings, spatial audio, noise reduction, in-vehicle and other scenarios, the pursuit of immersive experience is becoming more and more obvious, which requires craftsmanship to polish.

2. A new breakthrough in the technology side of active noise reduction

With the rapid growth of the TWS Bluetooth headset market, active noise cancellation has once again attracted attention. Under the blessing of deep learning, the application scope of active noise reduction (ANC) has expanded from single-point noise reduction such as smartphones and Bluetooth headsets to markets such as PCs, smart cars, and smart homes. But sound source separation (voiceprint recognition) and restoration (sound field reconstruction), network transmission of multi-channel audio, and algorithm convergence are still some issues worthy of attention.

Whether it is a smart car, smart furniture or the Metaverse, with the continuous expansion of application scenarios, engineers should focus on the combination of software and hardware in the technical direction of active noise reduction. The continuous improvement of the computing power of terminal devices and the increasing popularity of cloud services will inevitably provide more room for active noise reduction technology; in more and more scenarios, the coordinated scheduling (echo suppression) of multiple end-side devices is becoming a new topic.

3. Market demand determines the future of VoIP

How far will VoIP develop in the future? Mobile communication has developed from 2G to 5G era, and VoIP has also been upgraded from the original 8k to 44k high-definition call level. High sound quality also brings new challenges. The stability of the call is the first priority. The network switching and jitter also have a great impact on VoIP; the problems to be solved in the 1v1 and NvN call scenarios are also different. . Although the two major application topics of noise reduction and de-echoing have made some progress in the academic world, the complexity of the hardware also increases the difficulty of practical application in the industry.

VoIP has become more scenario-based and multi-device, and the integration of scenarios and devices is higher, pursuing an immersive experience. Therefore, there are many constraints in front-end processing (computing power), network, and the diversity and complexity of equipment. What the industry considers is stability. From a technical point of view, there are many things that VoIP can do, but from the perspective of application scenarios, market demand is the decisive factor.

4. The application of array technology ushered in a new scenario

The development of smart devices will cause many monophonic scenarios to be replaced by multi-channels in the future, so the application scenarios of array technology will become more and more. However, there are still challenges in how to select the audio signal of the microphone or speaker, how to evaluate the effect of the array, and how to reconstruct the sound field (such as multi-region, directivity, time-domain filtering, etc.). In addition to the laboratory environment, domestic and foreign research institutions have made progress in applications in outdoor concerts, music plazas and other fields.

5. Spatial audio is promising in the field of RTC

Since Apple launched spatial audio, it has quickly become the focus of the industry, and the application scenarios have also extended from headphone devices to loudspeakers. It is understood that Apple uses Dolby's solution in the implementation of spatial audio. Facebook and Microsoft have also done research on spatial audio recently, and the public demo shows that it is mainly used in conference scenarios.

Since the conference scene is too complicated, whether to do the separation or the channel first when doing spatial audio, there is currently no unified implementation solution in the industry; in terms of human & sound separation, which audio channel should be selected (play it out to the user) Mainly depends on the upper application. However, considering more application scenarios, engineers should know that the problems to be solved by spatial audio include the simulation and confrontation of real scenes, as well as the simulation and confrontation of virtual scenes.

At present, the application of spatial audio is also facing a more realistic problem. There are not enough audio source materials to apply. The devices and business scenarios with more market share do not have the corresponding conditions , especially the massive low-end and middle-end products. devices and short video applications.

In addition, some information is lost in the process of simulating reality in spatial audio, and there is also a lack of unified standards for the quality evaluation of spatial audio. The current experience (spatial sense) or sound quality of spatial audio is not ideal. Due to the large number of spatial audio scenarios, companies in the industry have proposed some different solutions, which may affect its further application.

6. AI and traditional algorithms are bound to be combined

When the upsurge of audio pre-processing algorithms continued until around 2018, AI found that audio signal processing methods encountered bottlenecks in serving specific industries (computing power and energy consumption problems at various playback ends), so more solutions to subdivided industries The scheme begins to return to the previous traditional audio signal processing technology. In the audio field, AI solves general problems and algorithms solve individual problems, so the combination of AI and traditional algorithms is inevitable.

7. Manufacturers with a combination of soft and hard technologies will have more advantages

The application of 3A in traditional scenarios is very mature, and there are still many areas for improvement in subdivision scenarios. Taking the conference scene as an example, there are many areas that can be improved (such as AEC convergence problems, full-duplex experience problems, etc.), and how to improve the user experience in details is the key point. In multi-person conferences, the problem of multi-device and multi-microphone array cannot be improved solely by software and algorithms. The hardware device itself can provide support from the bottom layer to supplement the lack of software capabilities, so as to achieve multiplier effect. In the future, the solution to the combination of software and hardware will be prevail.

8. Engineers should face up to the chaos and opportunities of the metaverse

In the current chaos of the metaverse market, there is indeed the act of cutting leeks, but it is undeniable that the upgrade of hardware equipment has also brought new scenarios and opportunities. Taking immersive audio as an example, in the mixed office scenario of enterprises, the current commercial solutions are expensive and have certain requirements on the deployment environment (such as the specifications and acoustic design of the conference room). breaking point. At present, the focus of the realization of the metaverse is the virtual audio immersion experience. If the metaverse/spatial audio is only for entertainment, should it be more focused on content production?

9. Where is the next "hot spot" in audio technology?

Aside from commercial behavior, the issue of "hot spots" may not be of great significance to developers, and the benefit distribution mechanism of most organizational structures is often detrimental to front-line developers.

From a technical point of view, new technologies such as VR and AI have been popular more than once, which means that some hot technologies have a certain periodicity and are worthy of long-term attention. For example, AI still has a lot of room for development in voice. At present, students in colleges and universities are more willing to learn because of the influence of large factories, but few people care about the traditional DSP (digital audio processor) direction; another example, Bluetooth headsets are used in hearing aids. The development of the listening and auxiliary listening market has begun to take shape.

In other respects, the current fire metaverse pays more attention to video than audio, which obviously does not conform to the common sense of walking on two legs. In more subdivided fields, the front-end perception capability of sound (speech recognition, scene recognition, sound source recognition) is used as a sensor, and the back-end logic is used to realize positioning, recognition, detection, etc. in the Internet of Things, industrial and agricultural production, medical and other scenarios. Automation management has a wide range of application prospects.


RTE开发者社区
663 声望976 粉丝

RTE 开发者社区是聚焦实时互动领域的中立开发者社区。不止于纯粹的技术交流,我们相信开发者具备更加丰盈的个体价值。行业发展变革、开发者职涯发展、技术创业创新资源,我们将陪跑开发者,共享、共建、共成长。