1 Background
OPPO Real-Time Communication ORTC is a set of low-latency, high-quality, cross-platform audio and video communication solutions that we have launched. It is open to developers through the OPPO cloud server, providing multi-person audio and video calls, Real-time monitoring, emergency command and dispatch, interactive live broadcast, IOT integration of all things, cloud gaming and other capabilities output. In the past, most of the audio and video communication technologies are based on SIP/H323 implementation solutions. This paper provides a fusion solution to realize the interconnection between RTC and SIP, so that RTC can easily realize the docking with PSTN and SIP Trunk. At the same time, traditional conference The terminal can also easily access the RTC system through this solution.
2 Introduction to ORTC
ORTC is an important infrastructure under the interconnection of all things, which solves the problem of real-time interconnection and intercommunication between various terminals on different platforms, and provides real-time audio and video communication capabilities for many products. Currently launched and planned products include Xiaobu video call, remote assistance, Mini game voice interaction, cloud game, interactive live broadcast, Xiaobu conference, virtual human, IOT, etc.
Relying on OPPO Cloud's high-quality node resources, network-wide collaboration, dynamic intelligent routing, regional intermediate, nearby access, hierarchical fan-out, and global connectivity, ORTC provides ultra-low latency, ultra-high concurrency, and ultra-high availability audio and video services .
3 ORTC system architecture
ORTC is a PAAS cloud service product that outputs real-time communication capabilities. The overall architecture includes media services, signaling services, monitoring services, interface services, and device-side access SDK. The signaling media is separated, and the signaling realizes distributed and intelligent scheduling through SLB and Redis. The overall media adopts the SFU architecture, which supports cascading. Different users in a single room can be distributed on multiple media instances. The media supports confluence recording, and the audio and video streams are cascaded to the MCU, which can further realize the recording and redistribution after the confluence. Audio and video transmission supports jitterbuffer, NACK, FIR, PLI, audio supports AGC, AEC, NS, FEC processing supports RED for both upstream and downstream audio transmission, and video upstream supports ULPFEC. The device-side SDK is based on the WebRTC process specification, and all SDKs support simulcast, real-time subtitles, and voice stimulation. ORTC is a high-concurrency, low-latency, cross-platform, and high-availability RTC service platform based on the powerful base platform of OPPO Global Hybrid Cloud.
4 SIP application scenarios and architecture
The application of SIP is ubiquitous. Whether it is traditional conference manufacturers such as Cisco, Huawei, and Polycom, or monitoring manufacturers such as Hikvision, Dahua, and Uniview, we can find the shadow of SIP. and other related industries occupy a large market. The rise of 5G and cloud services will undoubtedly inject new vitality into traditional industries and give them a driving force for innovation, because it not only greatly reduces costs for traditional industries, but also improves their efficiency and service experience. For example, high-speed monitoring will save a lot of money if it is stored in the cloud, and being able to see and interact with the monitoring screen in real time anytime, anywhere is undoubtedly a favorite of regulators and users. All these capabilities can be provided by RTC. However, if you directly use RTC, it will undoubtedly bring earth-shaking transformation to customers. This is also unacceptable to most enterprise users. They are happy to see smooth and seamless upgrade switching, and To do this, it is necessary to interconnect RTC and SIP, and customize corresponding solutions for deep integration.
Although RTC is inevitable for the development of real-time communication, it also has a backer, that is, basic network facilities. Due to the different use locations in different regions of the world, or even in the same region, different traffic and network signals may be different, resulting in pure data-based network signals. The application of the network cannot be well served, and at this time, the ordinary telephone network or PTT wireless walkie-talkie based on PSTN may show its characteristics of stability and reliability without disconnection. If they can be connected to the RTC, it will undoubtedly enhance the RTC. The ability to resist weak networks expands the scope of its application scenarios, and the access of these networks and devices also requires ORTC and SIP to be integrated.
With the deepening of the Internet of Everything, the reach of the network has extended to the smart devices within its power, which also provides a broader usage scenario for RTC. These devices may not require high reliability at ordinary times, and it is acceptable to occasionally drop the line; There may not be direct access to the Internet, but an independent network, such as doorbells in smart communities, wristbands for the elderly, watches for children and other IOT devices. The APP corresponding to the emergency called terminal may not be in real-time online status. To get emergency contact with them, it is necessary to make outgoing calls to PSTN/LTE network ordinary telephones by means of SIP.
All of the above have put forward requirements for the integration of RTC and SIP, and it is also a basic requirement for ORTC to make RTC stronger and larger into an audio and video communication infrastructure and to provide access capabilities of various protocols.
In the above architecture, with the help of SIP gateway and SIP media server, the seamless connection between RTC service and external PSTN network or SIP trunk based on standard SIP protocol is realized.
In the docking process between ORTC and SIP, the internal structure of ORTC is relatively independent. After processing by the SIP gateway, the SIP protocol will realize the mapping conversion with the WebRTC signaling protocol, and the media stream will be transmitted through standard RTP/RTCP. The connected SIP terminal will also be disguised as a virtual WebRTC Client to join the conference.
Although from an architectural point of view, ORTC to SIP only needs to implement signaling control and media transmission, and then in the implementation process, there are still many implementation details to consider, including transcoding, confluence, secure transmission, anti-weakness network characteristics, dual-stream output, etc.
5 ORTC-SIP Confluence Transcoding
As a general audio and video communication solution, ORTC must have the ability to support various types of audio and video codecs, including the cutting-edge audio and video codec Lyra, AV1, while traditional SIP conferences mostly use relatively conservative general-purpose The codec requires asymmetric encoding and decoding capabilities when the two are connected, and as a cut-in party, ORTC requires the transcoding function to handle this asymmetric encoding. At the same time, if you want to directly replace the third-party service, but only keep the other terminal, the other terminal needs to see the ORTC's confluence output, not the current SFU-based multiplex output, so the ORTC needs to have the confluence (MCU) capability, that is The media of our ORTC can no longer be just SFU, and must have the capability output of SFU+MCU at the same time.
The implementation of the specific architecture is compatible as shown in the figure above. After the sip terminal passes through the sip server, it will realize point-to-point audio and video communication with the virtual sip terminal provided by ORTC. It will be forwarded to the media receiver inside ORTC, output to the central media node through the pipe, and then join the ORTC intelligent scheduling and distribution process. At the same time, the media data of all ORTC access parties will also be aggregated to the MCU, and distributed to the sip after confluence. server, sip server does further encryption and forwards it to the sip terminal.
Similarly, the data after the MCU confluence can also be bypassed and distributed, and it can be integrated with the CDN system for distribution in the form of live broadcast, and it also has the ability to record in the cloud.
6 ORTC-SIP anti-weak network and security control
As a communication solution for global operation, the basic capabilities and usage scenarios of the access network it faces may vary widely. In some places, the network signal is strong, some places are weak, and some places may have serious congestion, which requires our system It has strong anti-weak network capabilities, and the software implemented by ORTC based on the WebRTC specification naturally has strong anti-weak network capabilities. As a latecomer, SIP must have not only the necessary anti-weak network features such as ARQ, FEC, FIR, and PLI, but also a relatively complete dynamic adaptive ability of bit rate and frame rate if it wants to be connected perfectly. .
As a commercial real-time communication software, confidentiality is undoubtedly its essential attribute, and ORTC's implementation scheme based on wss+srtp provides users with full-link signaling and data transmission security. As a SIP service for outsiders, if it is to be perfectly integrated into ORTC, security must also be a necessary condition. Therefore, in signaling access, the SIP protocol needs to be carried over TLS, and the media data should be transmitted by SRTP with a higher level of encryption. At the same time, corresponding port restrictions and firewall protection should be done in service deployment.
7 ORTC-SIP dual flow control
For scenarios such as video conferences and live broadcasts, dual-stream presentation has become an essential skill for audio and video communication. How to perform better and more flexible switching and control of dual streams has also become an important indicator of output capability. In the entire architecture of ORTC and SIP, we provide a dual-stream transmission control protocol based on BFCP, which solves the free switching between multi-terminal and dual-stream.
8 ORTC-SIP cluster
With the continuous expansion and extension of the business, the concurrency of the service bearer is also increasing, and the particularity of the service object also requires the system to have relatively reliable elastic expansion and contraction capabilities. The service is highly reliable and scalable, and the signaling, sip, media, and database services involved need to support clustering and load balancing.
With the help of clusters and intelligent routing, ORTC has realized the automatic scaling of cluster loads and services. As a stateful protocol, sip needs to do a lot of work to achieve reliable registration and load balancing on calls. We currently use a scheme based on four-layer network load balancing plus internal service bridging to achieve balanced registration and service distribution on the end side, and at the same time ensure that SIP terminals connected to a service are automatically switched to other service nodes when a service is suspended. Affects normal business, provides load balancing capability and improves the high reliability of the software.
9 Outlook
The development of 5G and AI technology has further promoted the evolution of audio and video technology, and put forward higher requirements for RTC. Lower latency, higher bit rate, and frame rate provide the possibility for large-scale cloud analysis. , but also reversely broadens the application scenarios of 5G, improves the accuracy of AI, and promotes the application of cloud AI.
The above picture is a method of measuring heart rate using video stream. The principle is based on the weak change of blood in the skin, which causes the change of its absorption of visible light, and then uses filtering and discrete Fourier transform to calculate the heart rate value. Combining it with ORTC enables real-time heart rate detection for remote consultation. There are still many application scenarios combining 5G and AI in industrial and special industries, and ORTC will undoubtedly play an increasingly important role in basic communication and gradually become an indispensable infrastructure.
On the basis of the current ORTC+SIP architecture, we can further expand the scenario access capability of ORTC+N, integrate more streaming media protocols, and aggregate more media applications, making our ORTC truly the foundation of ALL IN ONE Communication capability platform.
10 citations
[1] Network Load Balancing
http://cloud.oppoer.me/docsCenter/product1652b497b87d3cd5b22f0cc91
About the author
Sunny OPPO Senior Backend Engineer
With more than 10 years of experience in streaming media related research and development, he has participated in and led the research and development of IPTV, TV chat software, emergency command and dispatch platform based on real-time audio and video monitoring, multi-modal micro-expression judgment system based on real-time video, RTC video conference and other projects. .
For more exciting content, please scan the code and follow the [OPPO Digital Intelligence Technology] public account
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。