JS implements converting text to speech and playing it automatically

When developing a business system, sometimes it may be necessary to use voice to broadcast a piece of text, such as the following scenario:

When the customer pays successfully, the system needs to automatically broadcast the amount of the payment, and the payment amount is not fixed, such as 1 yuan, 2 yuan, 3.2 yuan, etc.

If it is a relatively fixed scene and the voice is limited, you can use the recording method, that is, first record all the audio resources, and then play the corresponding audio according to the business scene; but if the scene is not fixed, the required voice is also When it is different, if the recording method is used, there will be a lot of audio resources, and there may be incomplete problems. So at this time, it is a better way to use speech synthesis technology, that is, to convert text into speech in real time.

At present, the text-to-speech or speech synthesis technology is now very mature. Baidu, iFLYTEK, etc. provide related services to support the conversion of text into various forms of speech. Usually these services need to be paid for. It is not high, and if you want to save costs, you can directly use the speech synthesis function of the browser.

Some browsers already support the function of converting text to speech. Let's take a look at the official introduction:

Speech synthesis is accessed through the SpeechSynthesis interface, which provides text-to-speech (TTS) capabilities, which enable programs to read their text content (usually using the device's default speech synthesizer). Different voice types are represented by the SpeechSynthesisVoice object, and different parts of the text are represented by the SpeechSynthesisUtterance object. Finally, the SpeechSynthesis.speak() method can be used to generate speech.

Speech synthesis mainly involves the above three objects: SpeechSynthesis , SpeechSynthesisVoice and SpeechSynthesisUtterance .

1. Speech Synthesis

SpeechSynthesis is the controller interface of the speech synthesis service, which can be used to obtain information about the synthesized speech available on the device, start, pause, and other related commands. You can use the window.speechSynthesis property to access the SpeechSynthesis controller to obtain the entry of the speech synthesis function.

Or omit window and use speechSynthesis directly

It has paused , pending , speaking three properties, which are read-only properties and cannot be modified.

The methods it has are as follows:

SpeechSynthesis.cancel() : Cancel voice playback
SpeechSynthesis.getVoices() : Get a list of SpeechSynthesisVoices for all available voices on the current device.
SpeechSynthesis.pause() : Pause speech playback
SpeechSynthesis.resume() : Resume playback
SpeechSynthesis.speak() : Add a speech to the play queue, which will be played automatically after other speeches are played

2. SpeechSynthesisVoice

SpeechSynthesisVoice represents the voice resources supported by the current system. Each SpeechSynthesisVoice corresponds to related voice services. You can obtain the voice list through SpeechSynthesis.getVoices() , as follows:

It has 5 read-only properties, namely default , lang , name , localService , voiceURI .

3. SpeechSynthesisUtterance

SpeechSynthesisUtterance represents a pronunciation request, which contains what will be read aloud by the speech service, and how to read it (eg: language, pitch, volume).

It has 6 properties, as follows:

lang: the language when reading
pitch: the pitch of the sound, the value is 0~2, the normal pitch is 1
rate: read speech rate, the value is 0.1~10, the normal speech rate is 1
text: the text content to be synthesized into speech
voice: the voice service for reading text, the default is the one whose attribute value of SpeechSynthesisVoice is default
volume: the volume of the sound when reading, the value is 0~1, the normal volume is 1

In addition to the above properties, there are 7 event methods, as follows:

onboundary : Triggered when playback reaches the end of a word or sentence
onend : Triggered when voice playback ends
onerror : Triggered when there is an error in the audio playback
onmark : Triggered when the voice plays to the mark
onpause : Triggered when the voice playback is paused
onresume : Triggered when resuming voice playback
onstart : Triggered when voice playback starts

Regarding the properties and methods of SpeechSynthesisVoice , you can view them directly in the console, as follows:

4. Speech synthesis code

After understanding the related objects of speech synthesis, let's take a simple test

 let synth = window.speechSynthesis;
let utterThis = new SpeechSynthesisUtterance('支付宝到账7.5元');
synth.speak(utterThis);

After running, you can hear the voice. The Chrome browser is used here to test, and the voice sounds ok.

The following is a more general method, the specific implementation code:

 /**
 * @description 文字转语音方法
 * @public
 * @param { text, rate, lang, volume, pitch } object
 * @param  text 要合成的文字内容，字符串
 * @param  rate 读取文字的语速 0.1~10  正常1
 * @param  lang 读取文字时的语言
 * @param  volume  读取时声音的音量 0~1  正常1
 * @param  pitch  读取时声音的音高 0~2  正常1
 * @returns SpeechSynthesisUtterance
 */
function speak({ text, speechRate, lang, volume, pitch }, endEvent, startEvent) {
    if (!window.SpeechSynthesisUtterance) {
        console.warn('当前浏览器不支持文字转语音服务')
        return;
    }

    if (!text) {
        return;
    }

    const speechUtterance = new SpeechSynthesisUtterance();
    speechUtterance.text = text;
    speechUtterance.rate = speechRate || 1;
    speechUtterance.lang = lang || 'zh-CN';
    speechUtterance.volume = volume || 1;
    speechUtterance.pitch = pitch || 1;
    speechUtterance.onend = function() {
        endEvent && endEvent();
    };
    speechUtterance.onstart = function() {
        startEvent && startEvent();
    };
    speechSynthesis.speak(speechUtterance);
    
    return speechUtterance;
}

have a test

 speak({
    text: '微信到账100元'
}, function() {
    console.log('语音播放结束');
}, function() {
    console.log('语音开始播放');
});

After running, you can see that the speechUtterance instance at this time is as follows:

At the same time, the console will output the following information in turn

 语音开始播放
语音播放结束

References

JS implements converting text to speech and playing it automatically

1. Speech Synthesis

2. SpeechSynthesisVoice

3. SpeechSynthesisUtterance

4. Speech synthesis code

十方

引用和评论

Java代码判断当前操作系统是Windows或Linux或MacOS

Vue.js-Vue实例

2025年最新反编译微信小程序的教程及工具

你可能不知道的图片加载相关知识

巧用 CSS 实现高频出现的复杂怪状按钮 - 镂空的内凹圆角边框

JavaScript&ES6----数组去重的多种方法

Base64编码的“暗坑”：解密失败？可能是这些原因！