3
头图

Guide:

This article is mainly based on the WebRTC release-72 source code and the relevant experience accumulated by the cloud audio and video team. It mainly analyzes the following questions: What is the architecture of ADM (Audio Device Manager)? What is the startup process of ADM (Audio Device Manager)? What is the data flow of ADM (Audio Device Manager)? This article mainly analyzes the relevant core processes, so that when you need it, you can quickly locate the relevant modules.

Text|Chen Wenwen, NetEase Yunxin Senior Audio and Video Client Development Engineer

1. The basic structure of ADM

Architecture Analysis of ADM

In WebRTC, the behavior of ADM (Audio Device Manager) is defined by AudioDeviceModule, which is specifically implemented by AudioDeviceModuleImpl.

1.png

It can be seen from the above architecture diagram that AudioDeviceModule defines all the behaviors related to ADM (the above diagram only lists some cores, for more details, please refer to the complete definition in the source code). From the definition of AudioDeviceModule, we can see that the main responsibilities of AudioDeviceModule are as follows:

Initialize the audio playback/capture device;

Start the audio playback/capture device;

Stop the audio playback/capture device;

When the audio playback/capture device is working, operate it (for example: Mute, Adjust Volume);

Adjustment of the built-in 3A switch of the platform (mainly for the Android platform);

Get various related states of the current audio playback/capture device (not fully reflected in the class diagram, please refer to the source code for details)

AudioDeviceModule is specifically implemented by AudioDeviceModuleImpl, and there is also an AudioDeviceModuleForTest between the two, which mainly adds some test interfaces, which have no effect on the analysis of this article and can be ignored directly. There are two very important member variables in AudioDeviceModuleImpl, one is audio_device_, whose specific type is std::unique_ptr, and the other is audio_device_buffer_, whose specific type is AudioDeviceBuffer.

Among them, audio_device_ is the AudioDeviceGeneric type, and AudioDeviceGeneric is an abstraction of the specific audio capture and playback device of each platform, which is responsible for the operation of AudioDeviceModuleImpl on the specific device. When it comes to the operation of a specific device, AudioDeviceModuleImpl does not only do some state judgment, but also the specific operation of the device is done by AudioDeviceGeneric. The specific implementation of AudioDeviceGeneric is implemented by each platform. For example, the specific implementation of the iOS platform is AudioDeviceIOS, and the specific implementation of the Android platform is AudioDeviceTemplate. As for the specific implementation of each platform, those who are interested can analyze it individually. Let me talk about the most important common point here. From the definition of the specific implementation of each platform, it can be found that they all have an audio_device_buffer member variable, and this variable and the other important member variable audio_device_buffer_ in the aforementioned AudioDeviceModuleImpl, in fact, the two are the same one. AudioDeviceModuleImpl passes its own audio_device_buffer_ object to the specific platform implementation object through the AttachAudioBuffer() method.

The specific type of audio_device_buffer_ is AudioDeviceBuffer. The play_buffer_ and rec_buffer_ in AudioDeviceBuffer are buffers of int16_t type. The former is used as the Buffer to obtain and play the PCM data downward, and the latter is used as the Buffer to pass down the collected PCM data. The specific PCM data flow is in The following data flow is analyzed in detail in the chapter, and another member variable audio_transport_cb_ is of type AudioTransport. It is not difficult to see its role from the two core methods defined by the AudioTransport interface. One is to get down and play PCM data and store it in play_buffer_, the other One passes down the PCM data collected and stored in rec_buffer_, and the subsequent specific process refers to the chapter on data flow.

Thoughts on ADM Scaling

From the implementation of WebRTC ADM, WebRTC only implements specific hardware devices corresponding to each platform, and there are no virtual devices. However, in actual projects, it is often necessary to support external audio input/output, that is, the upper layer of the business push/pull audio data (PCM...) instead of directly starting the platform hardware for collection/playback. In this case, although the native WebRTC does not support it, it is very simple to transform. Since the virtual device has nothing to do with the platform, you can directly add a Virtual Device corresponding to the real device audio_device_ in AudioDeviceModuleImpl (the variable name is tentatively set as virtual_device_), virtual_device_ is also the same as audio_device_, it implements AudioDeviceGeneric related interfaces, and then refers to the implementation of audio_device_ to realize the "collection" (push) and "play" (pull) of data, no need to connect to the hardware device of the specific platform, the only thing that needs to be processed is Switching or co-working between the physical device audio_device_ and the virtual device virtual_device_.

2. Startup of ADM equipment

Start timing

There is no special requirement for the startup timing of the ADM device, as long as the ADM is created, but the Native source code of WebRTC will check whether the relevant ADM device needs to be started after the SDP negotiation is completed, and if necessary, the relevant ADM device will be started. , the startup of the acquisition and playback devices are completely independent, but the process is similar. The relevant trigger codes are as follows, and you can read them from top to bottom.

The following is the trigger source code for the start of the acquisition device (there are other trigger entries in the first few steps, but the latter is the same, only the core process is shown here):

//cricket::VoiceChannel
void VoiceChannel::UpdateMediaSendRecvState_w() {
//*
bool send = IsReadyToSendMedia_w();
media_channel()->SetSend(send);
}

// cricket::WebRtcVoiceMediaChannel
void WebRtcVoiceMediaChannel::SetSend(bool send) {
//*
for (auto& kv : send_streams_) {

kv.second->SetSend(send);

}
}

//cricket::WebRtcVoiceMediaChannel::WebRtcAudioSendStream
void SetSend(bool send) {
//*

UpdateSendState();

}

//cricket::WebRtcVoiceMediaChannel::WebRtcAudioSendStream
void UpdateSendState() {
//*

if (send_ && source_ != nullptr && rtp_parameters_.encodings[0].active) {
  stream_->Start();
} else {  // !send || source_ = nullptr
  stream_->Stop();
}

}

// webrtc::internal::WebRtcAudioSendStream
void AudioSendStream::Start() {
//*
audio_state()->AddSendingStream(this, encoder_sample_rate_hz_,

                              encoder_num_channels_);

}

// webrtc::internal::AudioState
void AudioState::AddSendingStream(webrtc::AudioSendStream* stream,

                              int sample_rate_hz,
                              size_t num_channels) {

//*
//Check whether the acquisition device has been started, if not, start it here
auto* adm = config_.audio_device_module.get();
if (!adm->Recording()) {

if (adm->InitRecording() == 0) {
  if (recording_enabled_) {
    adm->StartRecording();
  }
} else {
  RTC_DLOG_F(LS_ERROR) << "Failed to initialize recording.";
}

}
}
It can be seen from the trigger source code of the above acquisition device startup that if audio needs to be sent, no matter whether the previous acquisition device is activated or not, after SDP negotiation is completed, the acquisition device will be activated. If we want to control the startup timing of the acquisition device in the hands of the upper-layer business, we only need to comment the lines of code that start the device in the AddSendingStream method above, and then start the acquisition device through ADM when needed.

The following is the trigger source code of the playback device startup (there are other trigger entries in the first few steps, but the latter is the same, only the core process is shown here):

//cricket::VoiceChannel
void VoiceChannel::UpdateMediaSendRecvState_w() {
//*
bool recv = IsReadyToReceiveMedia_w();
media_channel()->SetPlayout(recv);
}

// cricket::WebRtcVoiceMediaChannel
void WebRtcVoiceMediaChannel::SetPlayout(bool playout) {
//*
return ChangePlayout(desired_playout_);
}

// cricket::WebRtcVoiceMediaChannel
void WebRtcVoiceMediaChannel::ChangePlayout(bool playout) {
//*
for (const auto& kv : recv_streams_) {

kv.second->SetPlayout(playout);

}
}

//cricket::WebRtcVoiceMediaChannel::WebRtcAudioReceiveStream
void SetPlayout(bool playout) {
//*

if (playout) {
  stream_->Start();
} else {
  stream_->Stop();
}

}

// webrtc::internal::AudioReceiveStream
void AudioReceiveStream::Start() {
//*
audio_state()->AddReceivingStream(this);
}

//webrtc::internal::AudioState
void AudioState::AddReceivingStream(webrtc::AudioReceiveStream* stream) {
//*
// //Check if the playback device has been started, if not, start it here
auto* adm = config_.audio_device_module.get();
if (!adm->Playing()) {

if (adm->InitPlayout() == 0) {
  if (playout_enabled_) {
    adm->StartPlayout();
  }
} else {
  RTC_DLOG_F(LS_ERROR) << "Failed to initialize playout.";
}

}
}
It can be seen from the trigger source code of the above playback device startup that if audio needs to be played, regardless of whether the previous playback device is started, the playback device will be started after SDP negotiation. If we want to control the startup timing of the playback device in the hands of the upper-layer business, we only need to comment the lines of code that start the device in the AddReceivingStream method above, and then start the playback device through ADM when needed.

start process

When the ADM device needs to be started, the InitXXX of the ADM is called first, followed by the StartXXX of the ADM. Of course, the corresponding implementation of the specific platform is called through the above architecture layers. The detailed process is as follows:
2.png

About stopping of equipment

Knowing the start of the ADM device, then the corresponding stop action, needless to say. If you look at the source code, you will find that the actions and processes of stopping are basically in one-to-one correspondence with startup.

3. ADM audio data flow

transmission of audio data

3.png

 

The above figure is the core process of audio data transmission, mainly the calling of core functions and the switching of threads. The PCM data is collected from the hardware device, and doing some simple data encapsulation in the collection thread will soon enter the APM module for corresponding 3A processing. From the process point of view, the APM module is very close to the original PCM data, which has a great effect on APM processing. It is very helpful, and interested students can study the knowledge related to APM in depth. After that, the data will be encapsulated into a Task and delivered to a thread called rtp_send_controller. At this point, the work of the collection thread is completed, and the collection thread can also start the next round of data reading as soon as possible, which can minimize the Influence on acquisition, read new PCM data as soon as possible to prevent PCM data loss or unnecessary delay.

Then the data arrives at the rtp_send_controller thread. The rtp_send_controller thread has three main functions here, one is to perform congestion control for rtp sending, the other is to encode PCM data, and the third is to package the encoded data into RtpPacketToSend (RtpPacket) format. The final RtpPacket data will be delivered to a queue called RoundRobinPacketQueue, so far the work of the rtp_send_controller thread is completed.

The following RtpPacket data will be processed in SendControllerThread. SendControllerThread is mainly used for sending status and window congestion control. Finally, the data is sent to the network thread (Network Thread), one of the three major threads of Webrtc, in the form of a message (type: MSG_SEND_RTP_PACKET). , and then send it to the network. At this point, the entire sending process ends.

Data reception and playback

4.png

 

The above figure is the core process of audio data reception and playback. The Network Thread is responsible for receiving RTP data from the network, and then asynchronously unpacking and distributing it to the Work Thread. If multiple channels of audio are received, there are multiple ChannelReceives, each with the same processing flow, and the final undecoded audio data is stored in the packet_buffer_ of the NetEq module. At the same time, the playback device thread continuously obtains audio data (10ms length) from all current audio ChannelReceives, and then triggers NetEq to request the decoder for audio decoding. For audio decoding, WebRTC provides a unified interface, and the specific decoder only needs to implement the corresponding interface, such as the default audio decoder opus of WebRTC. After traversing and decoding all the data in ChannelReceive, the audio is mixed through AudioMixer, and then handed over to the APM module for processing, and finally played to the device.

about the author

Chen Wenwen, senior audio and video client development engineer of NetEase Yunxin, is mainly responsible for the development and adaptation of Android audio and video.


网易数智
619 声望140 粉丝

欢迎关注网易云信 GitHub: