Microsoft MVP Research Lab | Real-time subtitles based on Blazor - 个人文章

Hello everyone, I am the laboratory researcher of this issue-Yu Kun. Today, through experiments and a complete operation process, I will show you how to build a subtitle system based on Blazor that can automatically generate real-time subtitles for voices and videos. Next, let us go to the laboratory to find out!

Microsoft MVP Lab Researcher

Analysis of Thinking

Many children's shoes may have been exposed to similar technologies. For example, when we record videos, we can use OBS-auto-subtitle to display real-time subtitles. However, this method exists in the form of an OBS plug-in, and there are certain restrictions in terms of language or function.

Therefore, in this experiment, we plan to use Blazor Server to implement a more powerful subtitle system that can provide similar functions.

First of all, it is clear that real-time subtitles need the assistance of the voice-to-text function. After investigation and evaluation, we found that although there are many similar services on the market, but can also have a certain amount of free quota, and support the two conditions of C# SDK, only Azure Cognitive Service (Cognitive Service) . Therefore, in this experiment we chose to use this service.

In general terms, it is very simple to use Blazor Server to refresh the page from the server side to the front end in real time. Therefore, in the specific implementation, just render a simple list text , and then access the screen through the Browser component

Coding implementation

1. Brief design

Generally speaking, the speech-to-text service is a process of continuous interaction with the server, so an object is needed to maintain communication with the server. We can design an ILiveCaptioningProvider to represent this behavior:

using System;
using System.Threading.Tasks;

namespace Newbe.LiveCaptioning.Services
{
    public interface ILiveCaptioningProvider : IAsyncDisposable
    {
        Task StartAsync();

        void AddCallBack(Func<CaptionItem, Task> captionCallBack);
    }
}

In order to expand the possibility of adapting to different providers, we also design an ILiveCaptioningProviderFactory

Used to show the behavior of creating ILiveCaptioningProvider:

namespace Newbe.LiveCaptioning.Services
{
    public interface ILiveCaptioningProviderFactory
    {
        ILiveCaptioningProvider Create();
    }
}

With these two interfaces, you only need to pass

ILiveCaptioningProviderFactory

Create ILiveCaptioningProvider, and then continuously receive callbacks and display them on the page.

2. Display the content on the page

With the basic project structure and interface, you can try to bind content to the page. To display real-time converted content on the interface, a certain algorithm conversion is required.

Before that, we need to determine the expected page display:

Show at least two lines of text on the page
Automatically wrap when a sentence exceeds the width of a line of text
When one sentence ends, the next sentence wraps automatically

For example, when the above sentence is read continuously, the following effects may occur:

The main thing to note is that when judging whether to update the current line or perform a new line, this part of the logic needs to be handled with care.

Three, fill reality

Speech recognition is performed through the SpeechRecognizer object provided by the Azure SDK.
The subject converts the event into a simple observable stream, which simplifies the processing of business callbacks.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Reactive.Linq;
using System.Reactive.Subjects;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;

namespace Newbe.LiveCaptioning.Services
{
    public class AzureLiveCaptioningProvider : ILiveCaptioningProvider
    {
        private readonly ILogger<AzureLiveCaptioningProvider> _logger;
        private readonly IOptions<LiveCaptionOptions> _options;
        private AudioConfig _audioConfig;
        private SpeechRecognizer _recognizer;
        private readonly List<Func<CaptionItem, Task>> _callbacks = new();
        private Subject<CaptionItem> _sub;

        public AzureLiveCaptioningProvider(
            ILogger<AzureLiveCaptioningProvider> logger,
            IOptions<LiveCaptionOptions> options)
        {
            _logger = logger;
            _options = options;
        }

        public async Task StartAsync()
        {
            var azureProviderOptions = _options.Value.Azure;
            var speechConfig = SpeechConfig.FromSubscription(azureProviderOptions.Key, azureProviderOptions.Region);
            speechConfig.SpeechRecognitionLanguage = azureProviderOptions.Language;
            _audioConfig = AudioConfig.FromDefaultMicrophoneInput();
            _recognizer = new SpeechRecognizer(speechConfig, _audioConfig);
            _sub = new Subject<CaptionItem>();
            _sub
                .Select(item => Observable.FromAsync(async () =>
                {
                    try
                    {
                        await Task.WhenAll(_callbacks.Select(f => f.Invoke(item)));
                    }
                    catch (Exception e)
                    {
                        _logger.LogError(e, "failed to recognize");
                    }
                }))
                .Merge()
                .Subscribe();


            _recognizer.Recognizing += (sender, args) =>
            {
                _sub.OnNext(new CaptionItem
                {
                    Text = args.Result.Text,
                    LineEnd = false
                });
            };
            _recognizer.Recognized += (sender, args) =>
            {
                _sub.OnNext(new CaptionItem
                {
                    Text = args.Result.Text,
                    LineEnd = true
                });
            };
            await _recognizer.StartContinuousRecognitionAsync();
        }

        public void AddCallBack(Func<CaptionItem, Task> captionCallBack)
        {
            _callbacks.Add(captionCallBack);
        }

        public ValueTask DisposeAsync()
        {
            _recognizer?.Dispose();
            _audioConfig?.Dispose();
            _sub?.Dispose();
            return ValueTask.CompletedTask;
        }
    }
}

There are many ways to implement a factory. Here, Autofac is used to assist in the creation of objects:

using Autofac;
using Microsoft.Extensions.Options;

namespace Newbe.LiveCaptioning.Services
{
    public class LiveCaptioningProviderFactory : ILiveCaptioningProviderFactory
    {
        private readonly ILifetimeScope _lifetimeScope;
        private readonly IOptions<LiveCaptionOptions> _options;

        public LiveCaptioningProviderFactory(
            ILifetimeScope lifetimeScope,
            IOptions<LiveCaptionOptions> options)
        {
            _lifetimeScope = lifetimeScope;
            _options = options;
        }

        public ILiveCaptioningProvider Create()
        {
            var liveCaptionProviderType = _options.Value.Provider;
            switch (liveCaptionProviderType)
            {
                case LiveCaptionProviderType.Azure:
                    var liveCaptioningProvider = _lifetimeScope.Resolve<AzureLiveCaptioningProvider>();
                    return liveCaptioningProvider;
                default:
                    throw new ProviderNotFoundException();
            }
        }
    }
}

Fill in the page logic to complete the effect:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Components;
using Microsoft.Extensions.Logging;
using Newbe.LiveCaptioning.Services;

namespace Newbe.LiveCaptioning.Pages
{
    public partial class Index : IAsyncDisposable
    {
        [Inject] public ILiveCaptioningProviderFactory LiveCaptioningProviderFactory { get; set; }
        [Inject] public ILogger<Index> Logger { get; set; }
        private ILiveCaptioningProvider _liveCaptioningProvider;

        private readonly List<CaptionDisplayItem> _captionList = new();

        protected override async Task OnAfterRenderAsync(bool firstRender)
        {
            await base.OnAfterRenderAsync(firstRender);
            if (firstRender)
            {
                _liveCaptioningProvider = LiveCaptioningProviderFactory.Create();
                _liveCaptioningProvider.AddCallBack(CaptionCallBack);
                await _liveCaptioningProvider.StartAsync();
            }
        }

        private int maxCount = 20;

        private Task CaptionCallBack(CaptionItem arg)
        {
            return InvokeAsync(() =>
            {
                Logger.LogDebug("Received: {Text}", arg.Text);
                var last = _captionList.FirstOrDefault();
                var newLine = false;
                var text = arg.Text;
                var skipPage = 0;
                if (arg.Text.Length > maxCount)
                {
                    skipPage = (int) Math.Floor(text.Length * 1.0 / maxCount);
                    text = arg.Text[(skipPage * maxCount)..];
                }

                if (last == null || skipPage > last.TagCount)
                {
                    newLine = true;
                }

                if (newLine || _captionList.Count == 0)
                {
                    _captionList.Insert(0, new CaptionDisplayItem
                    {
                        Text = text,
                        TagCount = arg.LineEnd ? -1 : skipPage
                    });
                }
                else
                {
                    _captionList[0].Text = text;
                    if (arg.LineEnd)
                    {
                        _captionList[0].TagCount = -1;
                    }
                }


                if (_captionList.Count > 4)
                {
                    _captionList.RemoveRange(4, _captionList.Count - 4);
                }

                StateHasChanged();
            });
        }

        private record CaptionDisplayItem
        {
            public string Text { get; set; }
            public int TagCount { get; set; }
        }

        public async ValueTask DisposeAsync()
        {
            if (_liveCaptioningProvider != null)
            {
                await _liveCaptioningProvider.DisposeAsync();
            }
        }
    }
}

Through the above core code, you can complete the relevant content from recognition to display.

Download and install

Before trying to understand the source code, you can take the following steps to experience the effect of the project.

Download the version corresponding to the operating system from the Release page.
Unzip this package to a pre-created folder.
Create a Cognitive Services in the Azure management portal.

reminds : There is a free quota of 5 hours per month for voice-to-text conversion, which can be found here. In addition, you can create a free Azure account here. The new account includes a 12-month free gift package.

Fill in the generated Region and Key into appsettings.Production.json.
Modify the Language option, for example, American English is en-us, Simplified Chinese is zh-cn. You can click here to view all supported languages.
Start Newbe.LiveCaptioning.exe, if you see the following message, it means everything is normal.

Finally, use a browser to open http://localhost:5000 and speak into the microphone so that subtitles can be generated in real time.

Add subtitles in OBS

Open OBS and add a Browser component.
Fill in the URL of the component with http://localhost:5000, and set an appropriate width and height.
Speak into the microphone, and the subtitles will come out.

Summarize

This is a very simple project application, through which developers can get a preliminary understanding of how to use Blazor. To get the source code of this project, please click here.

In addition, you can learn more about the various technologies and services involved in the above experiments through the following resource links.

AzureSpeech to Text

1: Initial experience of the speech recognition effect of Azure Speech

2: C#SDK docking solution

BlazorServer

1: How to push UI changes to the front end through the server

:2: How to trigger UI changes outside of the UI thread (actually, Winform reappears)

.Netcore publish

1: How to publish the dotnet core program as a single file application

2: RID released under different operating systems

GitHub

1: How to package and publish content to Release through GitHub Action

Regarding the content involved in this experiment, if you have any questions or ideas, or any suggestions for the follow-up exploration direction of our laboratory, please leave a message in the comments.

Of course, if you have any interesting ideas and have successfully implemented them, you are also welcome to submit your work, share your work with more developers, let us play software development together, improve and progress together!

Microsoft MVP project introduction

Microsoft's Most Valuable Expert is a global award granted by Microsoft to third-party technology professionals. For 28 years, technology community leaders around the world have won this award for sharing their expertise and experience in online and offline technology communities.

MVP is a rigorously selected team of experts. They represent the most skilled and intelligent people. They are experts who are passionate and helpful to the community. MVP is committed to helping others through speeches, forum questions and answers, creating websites, writing blogs, sharing videos, open source projects, organizing conferences, etc., and to help users in the Microsoft technology community use Microsoft technology to the greatest extent.
For more details, please visit the official website:
https://mvp.microsoft.com/zh-cn

the link within 1612c50c34a402, please refer to :