In 2021, with the acceleration of the social rhythm, the fragmented consumption time of users continues to increase. The current short video consumption user scale has exceeded 773 million people, and the market size of short video has exceeded 200 billion yuan. The short video industry is developing rapidly, but there is also the proliferation of low-quality content and the scarcity of high-quality content. At the Imagine Alibaba Cloud Video Cloud Panorama Innovation Summit on July 10, Li Jing, a senior algorithm expert in Alibaba Entertainment, delivered a keynote speech on "Re-innovation of Video Technology, Opening the Wave of Content Digitization", from the content production dilemma in the field of short video Set out, share the technical capabilities and application practices of the MediaAI platform, and decrypt the smart production technology of Youku's short videos. The following is a summary of the content of the speech.
How does Youku start the wave of content digitization?
Since it is Youku, it is necessary to start with long videos, short videos, and all the video forms presented in the final distribution. Youku's digitization will also be analyzed from the entire life cycle of the video. Youku video site as a long, long video from the first shot to make and then to complete, we have content assessment of digital systems .
After a long video is shot, how to use this copyrighted content to create a second time? This is the digitization of the creative content. Next, we hope to generate some short video special effects to make users look better and more interesting. This It is the digitization of special effects.
Finally, how do users experience the benefits of content digitization when watching videos on a mobile phone, tablet, or through a large TV screen on the terminal?
Therefore, the digital wave of content should with the life cycle of the entire video of 1612c73ea2ed82.
Deconstruction of creative elements
first digital content evaluation . There is a Beidouxing team in Alibaba Entertainment, which focuses on the content evaluation of long videos. Its core point is to use posterior data to measure the quality of the entire video content, video content or video production elements.
It also includes some extended things, such as directors, screenwriters, actors, scripts, etc. These things are things that everyone can hear but are far away from themselves. These are the extended content of video content production.
addition to the extension of , there is the extension information 1612c73ea2eded, including the information of some characters covered by the video itself, the language of the lens, and the character of the character. All of this information is deconstructed based on the capabilities of our NLP language (Natural Language Processing) or CV (Computer Vision).
So when we have the extended information and the deconstructing for the content, we hope that we can use this information to predict the user's psychological feelings or content preferences.
The prior data obtained from the data side contains very intuitive ratings, user interaction status, and number of comments. We hope that through these data, we can further dig out the user’s psychological and physiological status, so as to promote our core capabilities. Content evaluation.
The evaluation of content needs to be applied to AI capabilities, one is AI evaluation , and the other is AI physical examination .
What is AI evaluation?
Whether a video clip is good or not, it took a lot of manpower to review it with people in the past. It would be very, very difficult to use manual review to predict whether a TV series is a hit. Therefore, in the content evaluation, we use Beidouxing's system to estimate the level of the TV series from the information of the actors, suppliers, IP ratings, directors, and screenwriters.
In addition, it is possible to conduct a further and in-depth analysis of the actors in the TV series, such as his fan value, the whole word of mouth, etc. Through analysis, we let the platform make further auxiliary decisions, and then use our AI technology to finally evaluate the grade of this TV series.
The second point is AI physical examination.
When the video clips are finished and edited, use the algorithm to predict these clips, which are the highlights, which are the climax points, and which are the points that are very procrastinated and boring in the plot, to find the risk points that users may abandon the drama, and give some constructive points. The comments help the editor to optimize the editing. This is another application point of Youku's content evaluation.
After the long video, there is the short video.
After the broadcast of Youku's "Shanhe Ling" and "Sito", how can we further use them for the second short video creation?
Short video has been very popular in recent years. Last year, there were more than 700 million short video consumers. The number of short video unions and MCNs exceeded 20,000, and the market size exceeded 200 billion. In such a huge short video consumer market, we are facing some problems. The problem is that high-quality short videos are very scarce, and a large number of low-quality, shoddy short videos flood the entire market.
Therefore, we want to use automated production methods to replace those low-quality short videos. let our intelligent creation reach the level of human creation , which is what Youku wants to do.
Therefore, Alibaba Entertainment has developed the concept-level video deconstruction ability to empower intelligent creation. what is this?
Whenever you mention video deconstruction and CV capabilities, you may naturally think of some tags. For a video scene, there are characters, objects, motions, etc. In the past, these were particularly objective tag descriptions in the CV field, but for video creation when these things are not the creators of much-needed, creators of elements or material is required to let the audience have deep feelings, so we redefine semantic tags, enabling intelligent to be able to create .
Video Deconstruction Empowers Short Video Production
With concept-based label deconstruction capabilities, we can perform a series of edits.
Use condensed methods in the short video to cut off the plain plot or scenes without dialogue in this segment, and splice the dialogue or the information content segment together to form a complete short video segment.
In the variety show scene, re-editing the clips of Mao Wanyi and Qianxi to make a short video for fans can achieve very good results.
All of the above video productions have used the AI technology that we have deposited since the past year, and all the videos can be automated.
The current capacity of our entire team's smart production technology is more than 10,000 pieces a day, but because each piece has to be reviewed, the production capacity is slightly limited. The overall pass rate of the quality of intelligent production in manual review is 90%, which is much higher than the pass rate of videos created by ordinary people or Youku's UP owners.
Condensed video, quick look at multiple stylized episodes
Now everyone’s fragmented consumption habits, and the viewing rate of long videos are getting lower and lower, so we have different ways to concentrate on this situation.
Such as a short play of 5 minutes, a movie in 3 minutes, etc. When watching a show on Youku, there is a 15-second summary before each show, which is produced automatically by us.
At the same time, because we have deconstruction ability stylized , it is possible to extract the different styles of Housewives, for the girls to do the sweet Housewives, for boys may prefer a tragic type, these feeds can all be extracted from. The concentration of the video is that we select the key plots of different styles, so that you can understand the main line of the story in a short time.
Another type of short video is also very popular now, that is commentary short video .
We redefine the so-called Text to Video technology, deconstruct the video, produce the video script commentary, match the two, and finally generate the commentary short video through the script.
The commentary here comes from manual editing or existing scripts, and the commentary tts ability of the video is provided by Dharma Academy. Currently tts has more than 10 styles, with different dialects and different interpretation styles.
Next, there is graphic to video . Each hot news has a picture under it. We can make it directly generate a video, but the generated video is not a PPT, but a video content corresponding to the IP copyright picture.
The same is true for entertainment and information videos, where the pictures can be traced directly to find our corresponding video copyright content. If the picture is too complicated, the picture will be used directly in the video production.
Content presentation: special effects make the video look better
For the special effects of the video, we will automatically add special effects based on the action, the amplitude of the action, and the protagonist. The CV technology involved here includes: motion detection, motion amplitude detection, range detection, star recognition, BGM, etc.
Adding special effects to trivial things will be very messy, so we still have certain requirements for the range of motion. We can add special effects to a certain range. Relatively speaking, the viewing experience is much better. On the CG side, we have our own Daqian cloud rendering system. , Support the production of different special effects.
Variety special effects is an essential part of the production variety, however, make a special effect, very time-consuming, our goal is to let AI found that annotation and highlighting moments variety show, allowing AI to make rapid, mass .
At present, we have developed more than 30 special effect types based on CG technology, "This! "It's Hip-hop" is a small display of our special effects technology.
CBA has such a bullet time . At the bullet time, we can add a hot zone map of shooting to tell you how much the shooting is. It can help you get more information.
The new games are played on the interaction, it is necessary talked about the end of the first video transfer vertical cross . For example, when the subway is very crowded, everyone is watching it vertically, rarely horizontally. Based on this demand, we also have to convert horizontal video to vertical. The difficulty here is to determine the subject. The second is Stablize.
We also have a technology that is free-view video , which is the first domestic product in the industry that allows users to experience free-view video on the C-end. In last year’s "This! We have already applied this technique on the show "It's Hip-hop".
https://www.youku.com/video/XNTE5NjY3OTg1Mg==
Technology Winter Olympics Ice and Snow VR
This year's "This! "It's Hip-hop", our technology will be further upgraded, so stay tuned. The test was also conducted in the previous Winter Olympics test. We hope to use this technology to allow users to watch the sports content they want to watch from multiple angles. And Alibaba Entertainment will continue to innovate video technology to bring you a different audio-visual experience and start the wave of digitization of China's video technology content.
"Video Cloud Technology" Your most noteworthy audio and video technology public account, pushes practical technical articles from the front line of Alibaba Cloud every week, and exchanges and exchanges with first-class engineers in the audio and video field. The official account backstage reply [Technology] You can join the Alibaba Cloud Video Cloud Product Technology Exchange Group, discuss audio and video technologies with industry leaders, and get more industry latest information.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。