Image source: Chrome Plugin - Cloud Music Listening to Songs
Author of this article: Konggo
When you use the webpage to swipe the video on the video website, have you ever come across a BGM that stirs up waves in your heart, but you don't know its name? At this time, you can only open the mobile phone to listen to songs and recognize songs, but it is easier to solve this problem through a browser plug-in. There is no need to tediously take out the mobile phone, and it will not disturb others because it needs to be released, and it will not be difficult to identify because of ambient noise.
If you happen to have this need, you might as well try the Chrome browser plug-in "Cloud Music Listening Song" produced by Cloud Music, and you can also directly collect hearts. You can also go to the official website of the plugin to preview the actual running effect.
background
At present, most of the listening and recognizing song plug-ins on the Chrome store are produced abroad, and there are very few domestic products, which have poor support for domestic music. Since cloud music has this ability, we hope to cover every corner of this function and convey the beautiful power of music. At the same time, most of the plug-ins on the market are based on manifest v2 (compared to manifest v3, which has poor security, performance, and privacy). Extraction increases the computational pressure on the server and increases network transmission.
So is there a way to use the manifest v3 protocol to implement functions, and at the same time put the calculation of audio fingerprint extraction on the front end?
New protocol for Chrome browser plug-ins
The focus of this article is not on how to implement a browser plug-in itself. If you don't understand the development of the plug-in itself, you can refer to Google's official development documentation .
In particular, manifest v2 (MV2) is about to be abandoned, and will gradually not accept updates in 2022, and will gradually fail to run in 2023. All content in this article is based on manifest v3, which is more secure, has better performance, and has stronger privacy. (MV3) to implement.
The protocol upgrade will also bring some changes to the implementation of functions, because of the more secure restrictions of MV3, some flexible implementation methods based on MV2 (for example: executing remote code, you can use eval, new Function(...) and other unsafe methods ) will not be available. And this will bring some implementation difficulties to the listening song and song recognition plug-in.
The core impact points of the MV3 protocol on the plug-in implementation:
- The original Background Page is replaced by Service Worker, which means that operations such as Web API can no longer be performed on the Background Page.
- Remote code hosting is no longer supported, and dynamic loading of code is not possible, which means that executable code needs to be packaged directly into plugins.
Content security policy adjustment, no longer supports direct execution of unsafe code. WASM initialization related functions cannot be run directly.
The realization of listening to songs
The technology of listening to songs and recognizing songs is relatively mature. The overall idea is to extract audio fingerprints through digital audio sampling , and finally match the fingerprints in the database. The song with the highest feature value is the song that is considered to be recognized.
Audio extraction in browser plugins
Using plug-ins to record audio and video in web pages is actually very simple. You only need the chrome.tabCapture
API to realize the audio recording of the web page itself. We need to sample the audio data to ensure the rules for calculating HASH. consistent with the database data.
For the obtained stream, audio transcription and sampling can be performed. Generally, there are three processing methods:
- createScriptProcessor : This method is the easiest for audio processing, but this method has been marked as deprecated in the W3C standard. Not recommended for use
- MediaRecorder : Audio transcription can also be done with the help of the Media API, but there is no way to do fine-grained processing.
- AudioWorkletNode : used to replace createScriptProcessor for audio processing, it can solve the pressure on the main thread caused by synchronous thread processing, and at the same time, it can perform audio signal processing by bit, which is also selected here for audio sampling.
Audio sampling and sampling duration control method based on AudioWorkletNode:
Module registration, the module loading here is through the loading method of the file, and PitchProcessor.js corresponds to the file in the root directory:
const audio_ctx = new window.AudioContext({ sampleRate: 8000, }); await audio_ctx.audioWorklet.addModule("PitchProcessor.js");
Create AudioWorkletNode, which is mainly used to receive data information passed back from the WebAudio thread through
port.message
, so that data processing can be performed on the main thread:class PitchNode extends AudioWorkletNode { // Handle an uncaught exception thrown in the PitchProcessor. onprocessorerror(err) { console.log( `An error from AudioWorkletProcessor.process() occurred: ${err}` ); } init(callback) { this.callback = callback; this.port.onmessage = (event) => this.onmessage(event.data); } onmessage(event) { if (event.type === 'getData') { if (this.callback) { this.callback(event.result); } } } } const node = new PitchNode(audio_ctx, "PitchProcessor");
Processing
AudioWorkletProcessor.process
, which is the content of the PitchProcessor.js file:process(inputs, outputs) { const inputChannels = inputs[0]; const inputSamples = inputChannels[0]; if (this.samples.length < 48000) { this.samples = concatFloat32Array(this.samples, inputSamples); } else { this.port.postMessage({ type: 'getData', result: this.samples }); this.samples = new Float32Array(0); } return true; }
Take the first channel of the first input channel to collect digital signals. After collecting the length that meets the definition (for example, 48000 here), it is notified to the main thread for signal identification processing.
Based on the process
method, many interesting attempts can be made, such as the most basic white noise generation.
Audio fingerprint extraction
After extracting the audio signal, the next step is to extract the fingerprint of the signal data. What we extracted is actually a piece of binary data, and the data needs to be Fourier transformed and converted into frequency domain information for feature representation. The logic of specific fingerprint extraction is a set of regular and complex algorithms. Conventional fingerprint extraction methods: 1) audio fingerprints based on frequency band energy; 2) audio fingerprints based on landmarks; 3) audio fingerprints based on neural networks, interested in algorithms can read related papers such as: A Highly Robust Audio Fingerprinting System
. The entire operation has certain performance requirements. Based on WebAssembly, better CPU performance can be obtained. Nowadays, C++/C/Rust has a relatively convenient way to compile it into WebAssembly bytecode, which will not be expanded here.
Next, when you try to initialize the WASM module by running it in the plugin scenario, you will most likely encounter the following exception:
Refused to compile or instantiate WebAssembly module because 'wasm-eval' is not an allowed source of script in the following Content Security Policy directive: "script-src 'self' 'unsafe-inline' 'unsafe-eval' ...
This is because the strict CSP definition needs to be followed when using WebAssembly. For Chrome MV2, it can be resolved by appending "content_security_policy":"script-src 'self' 'unsafe-eval';"
. In MV3, due to stricter privacy and security restrictions, this simple and rude implementation is no longer allowed.
In script-src object-src worker-src
, the only allowed values are:
-
self
-
none
localhost
That is, there is no way to define attributes such as unsafe-eval, so it is no longer feasible to simply run wasm directly in the plugin page.
It seems to have reached a dead end here? There are always more methods than problems. I carefully examine the document and find that the document has such a description:CSP modifications for sandbox have no such new restrictions. - Chrome plugin development documentation
That is to say, this security restriction does not exist in sandbox mode. The plugin itself can define a sandbox page. Although this page cannot access the web/chrome API, it can run some so-called "unsafe" methods, such as eval、new Function、WebAssembly.instantiate
and so on.
Therefore, you can use the sandbox page to load and run the WASM module, return the calculation result to the main page, and the overall fingerprint collection process becomes as follows:
As for how to communicate data between the main page and the sandbox page, you can load the iFrame in the main page and use the contentWindow of the iFrame to communicate with the main window. The data flow is as follows:
The process of basic audio extraction and fingerprint extraction has been completed here, and the remaining part is to perform feature matching in the database through fingerprints.
feature matching
After extracting the audio fingerprint, the next step is to perform audio retrieval in the fingerprint database. The fingerprint database can be implemented with a hash table. Each table entry represents the music ID and the time when the music appears corresponding to the same fingerprint, and a fingerprint database is constructed. Access the extracted fingerprints from the database to get matching songs. Of course, this is only a basic process, and the specific algorithm optimization methods are still very different from each other. In addition to copyright reasons, the algorithm directly leads to the efficiency and accuracy of each match. The implementation of the plug-in here is still in a way of giving priority to efficiency.
write at the end
The above roughly describes the general process of listening to and recognizing songs based on WebAssembly
and MV3. Although the plug-in is flexible and easy to use, Google is also aware of some security and privacy issues brought by the plug-in, and has carried out a large-scale migration. The MV3 protocol is more privacy and security, but it also limits the implementation of many functions. After 2023, there will be a large number of plug-ins that can no longer be used.
About the functions that have been completed by the Song Song Recognition Plug- in, including audio recognition, red heart playlist collection, etc., the functions will continue to be expanded in the future. I hope this small function can help you.
References
- https://developer.mozilla.org/en-US/
- https://developer.chrome.com/docs/apps/
- https://www.w3.org/TR/webaudio/#widl-AudioContext-createScriptProcessor-ScriptProcessorNode-unsigned-long-bufferSize-unsigned-long-numberOfInputChannels-unsigned-long-numberOfOutputChannels
- https://developer.mozilla.org/en-US/docs/WebAssembly/C_to_wasm
- http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=152C085A95A4B5EF1E83E9EECC283931?doi=10.1.1.103.2175&rep=rep1&type=pdf
This article is published from the NetEase Cloud Music technical team, and any form of reprinting of the article is prohibited without authorization. We recruit various technical positions all year round. If you are ready to change jobs and happen to like cloud music, then join us at grp.music-fe(at)corp.netease.com!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。