In the Internet era, the data on the flow of goods brought by live broadcasts has exploded. Many apps have also opened up overseas online shopping markets with huge potential. Cross-border e-commerce live broadcasts can not only stimulate commodity consumption, but also bring Chinese culture to the world. However, the current problem with cross-border live streaming is the language problem. Many foreign netizens cannot understand the content of the Chinese anchors, and there are very few anchors who have the ability to carry both foreign languages and goods.

A new capability provided by the HMS Core machine learning servicesimultaneous interpretation can well solve the language barrier of cross-border e-commerce live broadcasts, allowing Chinese anchors to bring goods overseas and helping live broadcast applications develop overseas markets. HMS Core simultaneous interpretation covers core technologies such as speech recognition, translation, and speech synthesis of machine learning services. It first converts the input real-time speech into text, then translates the text into another language, and finally converts the translated text into text. Convert text to voice playback. Simultaneous interpretation capabilities can help solve cross-language real-time communication in various scenarios, support Chinese and English translation, and provide multi-timbral voice broadcasts . It can be widely used in conferences, live broadcasts, etc.

Speech recognition + machine translation, both quality and efficiency

For simultaneous interpretation, accurate source language input + translation output is an important measure. In the face of the main scenarios of simultaneous interpretation—conference speeches, live subtitles, conference interviews, smart education... Often accompanied by long-term continuous input of audio, end-side recognition is performed through algorithms such as speech energy detection, silence detection, and heartbeat detection. Realize the effective segmentation of long audio, so as to send effective speech fragments to the speech recognition module, improve the efficiency of speech recognition, reduce the delay of simultaneous interpretation, and reduce the impact of noise on the recognition effect.

On the other hand, after the speech is converted into text, there are some recognition errors, colloquial expressions, many modal particles, repeated expressions of some content, etc., which cause the text to be unfluent and the recognition text is not carefully segmented. In response to these situations, the machine learning service uses NLP semantic understanding, homophone ambiguity processing, ambient sound processing, colloquialization processing and other error correction technologies in the text processing module to achieve text smoothing, automatic sentence segmentation and other functions to ensure that high-quality translated texts are returned. , so as to enhance the functions of speech recognition and translation and improve the effect of simultaneous interpretation.

Full coverage of various scenes, bilingual output of Chinese and English subtitles

Simultaneous interpretation can be applied in face-to-face cross-language communication scenarios as well as in remote communication. Whether it is a face-to-face multilingual meeting, a remote meeting, or watching a foreign language video, the simultaneous interpretation capability can generate bilingual subtitles in real time, reduce the cost of understanding, and improve the efficiency of work and study.

Customized voice broadcast, both listening and watching are correct

Simultaneous interpretation capability Through advanced deep neural synthesis technology, it can output audio stream data in real time, and at the same time provide a variety of Chinese and English male and female tones to choose from, and can adjust the volume and speech rate (Chinese and English support 5 times adjustment). Make the pronunciation more realistic and natural. The delay is reduced through real-time voice broadcast, combined with real-time subtitle content, to provide participants with an audio-visual integrated immersive simultaneous interpretation experience.

Through the organic integration of the three technologies of speech recognition, machine translation and speech synthesis, HMS Core machine learning service provides developers with low-latency and high-accuracy simultaneous interpretation capabilities, helping users to communicate more smoothly internationally. Create a new "sound" state of simultaneous interpretation. Developers are welcome to visit the official website of the Machine Learning Service for more detailed product introductions and access preparations.

Learn more details>>

Visit the official website of Huawei Developer Alliance
Get development guidance documents
Huawei Mobile Services Open Source Warehouse Address: GitHub , Gitee

Follow us to know the latest technical information of HMS Core for the first time~


HarmonyOS_SDK
596 声望11.7k 粉丝

HarmonyOS SDK通过将HarmonyOS系统级能力对外开放,支撑开发者高效打造更纯净、更智能、更精致、更易用的鸿蒙应用,和开发者共同成长。