7

When developing a business system, sometimes it may be necessary to use voice to broadcast a piece of text, such as the following scenario:

When the customer pays successfully, the system needs to automatically broadcast the amount of the payment, and the payment amount is not fixed, such as 1 yuan, 2 yuan, 3.2 yuan, etc.

If it is a relatively fixed scene and the voice is limited, you can use the recording method, that is, first record all the audio resources, and then play the corresponding audio according to the business scene; but if the scene is not fixed, the required voice is also When it is different, if the recording method is used, there will be a lot of audio resources, and there may be incomplete problems. So at this time, it is a better way to use speech synthesis technology, that is, to convert text into speech in real time.

At present, the text-to-speech or speech synthesis technology is now very mature. Baidu, iFLYTEK, etc. provide related services to support the conversion of text into various forms of speech. Usually these services need to be paid for. It is not high, and if you want to save costs, you can directly use the speech synthesis function of the browser.

Some browsers already support the function of converting text to speech. Let's take a look at the official introduction:

Speech synthesis is accessed through the SpeechSynthesis interface, which provides text-to-speech (TTS) capabilities, which enable programs to read their text content (usually using the device's default speech synthesizer). Different voice types are represented by the SpeechSynthesisVoice object, and different parts of the text are represented by the SpeechSynthesisUtterance object. Finally, the SpeechSynthesis.speak() method can be used to generate speech.

Speech synthesis mainly involves the above three objects: SpeechSynthesis , SpeechSynthesisVoice and SpeechSynthesisUtterance .

1. Speech Synthesis

SpeechSynthesis is the controller interface of the speech synthesis service, which can be used to obtain information about the synthesized speech available on the device, start, pause, and other related commands. You can use the window.speechSynthesis property to access the SpeechSynthesis controller to obtain the entry of the speech synthesis function.

Or omit window and use speechSynthesis directly

It has paused , pending , speaking three properties, which are read-only properties and cannot be modified.

The methods it has are as follows:

  • SpeechSynthesis.cancel() : Cancel voice playback
  • SpeechSynthesis.getVoices() : Get a list of SpeechSynthesisVoices for all available voices on the current device.
  • SpeechSynthesis.pause() : Pause speech playback
  • SpeechSynthesis.resume() : Resume playback
  • SpeechSynthesis.speak() : Add a speech to the play queue, which will be played automatically after other speeches are played

2. SpeechSynthesisVoice

SpeechSynthesisVoice represents the voice resources supported by the current system. Each SpeechSynthesisVoice corresponds to related voice services. You can obtain the voice list through SpeechSynthesis.getVoices() , as follows:

It has 5 read-only properties, namely default , lang , name , localService , voiceURI .

3. SpeechSynthesisUtterance

SpeechSynthesisUtterance represents a pronunciation request, which contains what will be read aloud by the speech service, and how to read it (eg: language, pitch, volume).

It has 6 properties, as follows:

  • lang: the language when reading
  • pitch: the pitch of the sound, the value is 0~2, the normal pitch is 1
  • rate: read speech rate, the value is 0.1~10, the normal speech rate is 1
  • text: the text content to be synthesized into speech
  • voice: the voice service for reading text, the default is the one whose attribute value of SpeechSynthesisVoice is default
  • volume: the volume of the sound when reading, the value is 0~1, the normal volume is 1

In addition to the above properties, there are 7 event methods, as follows:

  • onboundary : Triggered when playback reaches the end of a word or sentence
  • onend : Triggered when voice playback ends
  • onerror : Triggered when there is an error in the audio playback
  • onmark : Triggered when the voice plays to the mark
  • onpause : Triggered when the voice playback is paused
  • onresume : Triggered when resuming voice playback
  • onstart : Triggered when voice playback starts

Regarding the properties and methods of SpeechSynthesisVoice , you can view them directly in the console, as follows:

4. Speech synthesis code

After understanding the related objects of speech synthesis, let's take a simple test

 let synth = window.speechSynthesis;
let utterThis = new SpeechSynthesisUtterance('支付宝到账7.5元');
synth.speak(utterThis);

After running, you can hear the voice. The Chrome browser is used here to test, and the voice sounds ok.

The following is a more general method, the specific implementation code:

 /**
 * @description 文字转语音方法
 * @public
 * @param { text, rate, lang, volume, pitch } object
 * @param  text 要合成的文字内容,字符串
 * @param  rate 读取文字的语速 0.1~10  正常1
 * @param  lang 读取文字时的语言
 * @param  volume  读取时声音的音量 0~1  正常1
 * @param  pitch  读取时声音的音高 0~2  正常1
 * @returns SpeechSynthesisUtterance
 */
function speak({ text, speechRate, lang, volume, pitch }, endEvent, startEvent) {
    if (!window.SpeechSynthesisUtterance) {
        console.warn('当前浏览器不支持文字转语音服务')
        return;
    }

    if (!text) {
        return;
    }

    const speechUtterance = new SpeechSynthesisUtterance();
    speechUtterance.text = text;
    speechUtterance.rate = speechRate || 1;
    speechUtterance.lang = lang || 'zh-CN';
    speechUtterance.volume = volume || 1;
    speechUtterance.pitch = pitch || 1;
    speechUtterance.onend = function() {
        endEvent && endEvent();
    };
    speechUtterance.onstart = function() {
        startEvent && startEvent();
    };
    speechSynthesis.speak(speechUtterance);
    
    return speechUtterance;
}

have a test

 speak({
    text: '微信到账100元'
}, function() {
    console.log('语音播放结束');
}, function() {
    console.log('语音开始播放');
});

After running, you can see that the speechUtterance instance at this time is as follows:

At the same time, the console will output the following information in turn

 语音开始播放
语音播放结束

References


十方
234 声望433 粉丝