Vscode voice annotations make information richer (below)

foreword

The last article of this series mainly talks about the knowledge of recording audio & audio file storage. At that time, because of the bug in the recording, I was not in the mood to eat for a week (voice-annotation).

1. `MP3` file storage location

"Voice Notes" usage scenarios

Use "Voice Notes" for individual items.
Multiple projects use Voice Notes.
The mp3 files generated by "Voice Notes" are placed in their own projects.
The mp3 files generated by "Voice Notes" are uniformly stored somewhere in the world.
A part of mp3 generated by "Voice Notes" exists in the project and a part uses the global path.

`vscode workspace`

Where the specific audio is stored must read the user's configuration, but if the user only configures one path globally, then this path cannot satisfy the scenario where each project stores the audio files in different locations. At this time, the vscode workspace is introduced. the concept of.

If the eslint rules of each of our projects are different, at this time, we only configure the eslint rules globally to meet this scenario. At this time, we need to create a new .vscode folder in the project, and create a settings.json file in this file. The configuration written inside is the personalized configuration for the current project.

`Configure workspace (absolute path or relative path)`

Although I understand the concept of workspace, it still can't solve the actual problem. For example, if we configure the absolute path of the audio file in the workspace, then the .vscode > settings.json file is to be uploaded to the code repository, so the configuration will be pulled by everyone, The computer system of each developer may be different, and the location of the folder where the project is stored is also different, so defining the absolute path of in the workspace cannot solve the problem of team collaboration.

If the user configures the relative path, and this path is relative to the current settings.json file itself, then the question becomes how to know where the settings.json file is? Although the vscode plugin can read the configuration information of the workspace, but Could not read the location of the settings.json file.

`settings.json file tracing`

At first, I thought about letting the user manually select a location to store the audio file after each recording, but obviously this method is not simple enough in operation. During a run, I suddenly thought that the user would definitely want to record the audio. To click somewhere to trigger the recording function, vscode provides a method to get the location of the file where the user triggered the command.

Then I use the file location where the user triggered the command as the starting point, and search for the .vscode file step by step. For example, if the user clicks in the /xxx1/xxx2/xxx3.js file to record the audio comment, then I will first judge whether /xxx1/xxx2/.vscode is a folder, if not Then judge whether /xxx1/.vscode is a folder, and so on until the location of the .vscode folder is found, if not found, an error will be reported.

`Validation of audio folder path`

Using the location of the settings.json file and the relative path of configured by the user, the real audio storage location can be obtained. At this time, you can't relax. You need to check whether the obtained folder path really has a folder. User creates folders.

There may be problems at this time. If there is currently a 162165e3e7aa60 a project with a b project , but you want to record audio in the b project, but the b project does not set .vscode workspace folder, but There is .vscode > settings.json in the a project, then it will cause the recording file of the b project to be stored in the a project.

The above problems cannot accurately detect the user's real target path, so the way I think of is to record the audio page to pre-display the path to be saved, and let the user be the final gatekeeper:

Current plugin easy user configuration:

{
    "voiceAnnotation": {
        "dirPath": "../mp3"
    }
}

`Second, the definition of configuration`

If the user does not want to store the audio file in the project, for fear that the project will become larger, we support a separate audio storage project. At this time, an absolute path of needs to be configured globally, because the global configuration will not be synchronized to the For other developers, when we can't get the audio path defined by the user in the vscode workspace, we take the value of the global path. Let's configure the global properties together:

package.json Added global configuration settings:

    "contributes": 
        "configuration": {
            "type": "object",
            "title": "语音注释配置",
            "properties": {
                "voiceAnnotation.globalDirPath": {
                    "type": "string",
                    "default": "",
                    "description": "语音注释文件的'绝对路径' (优先级低于工作空间的voiceAnnotation.dirPath)。"
                },
                "voiceAnnotation.serverProt": {
                    "type": "number",
                    "default": 8830,
                    "description": "默认值为8830"
                }
            }
        }
    },

For the specific meaning of each attribute, please refer to the effect diagram after configuration:

`3. How to get the location of the audio folder`

util/index.ts (There are specific method analysis below):

export function getVoiceAnnotationDirPath() {
    const activeFilePath: string = vscode.window.activeTextEditor?.document?.fileName ?? "";
    const voiceAnnotationDirPath: string = vscode.workspace.getConfiguration().get("voiceAnnotation.dirPath") || "";
    const workspaceFilePathArr = activeFilePath.split(path.sep)
    let targetPath = "";
    for (let i = workspaceFilePathArr.length - 1; i > 0; i--) {
        try {
            const itemPath = `${path.sep}${workspaceFilePathArr.slice(1, i).join(path.sep)}${path.sep}.vscode`;
            fs.statSync(itemPath).isDirectory();
            targetPath = itemPath;
            break
        } catch (_) { }
    }
    if (voiceAnnotationDirPath && targetPath) {
        return path.resolve(targetPath, voiceAnnotationDirPath)
    } else {
        const globalDirPath = vscode.workspace
            .getConfiguration()
            .get("voiceAnnotation.globalDirPath");

        if (globalDirPath) {
            return globalDirPath as string
        } else {
            getVoiceAnnotationDirPathErr()
        }
    }
}

function getVoiceAnnotationDirPathErr() {
    vscode.window.showErrorMessage(`请于 .vscode/setting.json 内设置
    "voiceAnnotation": {
        "dirPath": "音频文件夹的相对路径"
    }`)
}

`Sentence-by-sentence analysis`

`1: Get the active location`

 vscode.window.activeTextEditor?.document?.fileName

The above method can get the file location where your current trigger command is located. For example, if you right-click inside a.js and click an option in the menu, then using the above method will get the absolute path of the a.js file, of course not only Operation menu, all commands including hover certain piece of text can call this method to get the file location.

`2: Get configuration items`

 vscode.workspace.getConfiguration().get("voiceAnnotation.dirPath") || "";
 vscode.workspace.getConfiguration().get("voiceAnnotation.globalDirPath");

The above method can not only obtain the configuration of the .vscode > settings.json file in the project, but also the method of obtaining the global configuration, so we have to distinguish which one to use, so here I named it dirPath and globalDirPath .

`3: file path separator`

The "/" in /xxx/xx/x.js is path.sep , because there are differences in mac or window systems, and path.sep is used here to be compatible with users of other systems.

`4: report an error`

If neither the relative path nor the absolute path can be obtained, an error will be thrown:

 vscode.window.showErrorMessage(错误信息)

`5: use`

The first is when the server saves the audio, and the second is when the web page is opened, it will be passed to the front-end user to display the save path.

`Fourth, the initial knowledge of recording`

For students who have not used the recording function, you may not have seen this method navigator.mediaDevices , which returns a MediaDevices object, which provides connection access to media input devices such as cameras and microphones, including screen sharing.

To record audio, you need to obtain the user's permission first. navigator.mediaDevices.getUserMedia is a success callback when the user's permission is successfully obtained and the device is available.

navigator.mediaDevices.getUserMedia({audio:true})
.then((stream)=>{
  // 因为我们输入的是{audio:true}, 则stream是音频的内容流
})
.carch((err)=>{

})

`5. Initialize recording equipment and configuration`

The following shows the 'initialization' that defines the playback tag and the environment, as usual, code first, and then you explain sentence by sentence:

  <header>
    <audio id="audio" controls></audio>
    <audio id="replayAudio" controls></audio>
  </header>

        let audioCtx = {}
        let processor;
        let userMediStream;
        navigator.mediaDevices.getUserMedia({ audio: true })
            .then(function (stream) {
                userMediStream = stream;
                audio.srcObject = stream;
                audio.onloadedmetadata = function (e) {
                    audio.muted = true;
                };
            })
            .catch(function (err) {
                console.log(err);
            });

`1: Find interesting things, get elements directly by id`

`2: The content stream that saves the audio`

Here, the media source is saved in a global variable, which is convenient for subsequent replay of the sound:

  userMediStream = stream;

srcObject attribute specifies the 'media source' associated with the <audio> tag:

 audio.srcObject = stream;

`3: Monitor data changes`

When the loading is complete, set audio.muted = true; to mute the device. Why is the recorded audio still muted? In fact, it is because we do not need to play our sound at the same time when recording, which will cause a heavy "echo", so it needs to be muted here.

audio.onloadedmetadata = function (e) {
    audio.muted = true;
};

`6. Start recording`

First add a click event for the 'start recording' button:

  const oAudio = document.getElementById("audio");
  let buffer = [];

  oStartBt.addEventListener("click", function () {
    oAudio.srcObject = userMediStream;
    oAudio.play();
    buffer = [];
    const options = {
      mimeType: "audio/webm"
    };
    mediaRecorder = new MediaRecorder(userMediStream, options);
    mediaRecorder.ondataavailable = handleDataAvailable;
    mediaRecorder.start(10);
  });

Process the acquired audio data

  function handleDataAvailable(e) {
    if (e && e.data && e.data.size > 0) {
      buffer.push(e.data);
    }
  }

oAudio.srcObject defines the 'media source' of the playback tag.
oAudio.play(); starts playing, since we set muted = true mute, so here is the start of recording.
buffer is used to store audio data, each recording needs to clear the last residue.
new MediaRecorder creates a MediaRecorder object that records the specified MediaStream, that is to say, this method exists for the recording function. Its second parameter can enter the specified mimeType type. I checked the specific type on MDN. .
mediaRecorder.ondataavailable defines the specific processing logic for each piece of audio data.
mediaRecorder.start(10); the audio for 10 milliseconds. The audio information is stored in the Blob. I understand the configuration here is to generate a Blob object every 10 milliseconds.

At this point, our audio information can be continuously collected in the array buffer . So far, we have completed the recording function, and then we need to enrich its functions.

`7. end, replay, re-record`

`1: end recording`

Of course, the recording will come to an end. Some students have asked whether it is necessary to limit the length or size of the audio? But I feel that the specific restriction rules should be customized by each team. I only provide core functions in this version.

  const oEndBt = document.getElementById("endBt");

  oEndBt.addEventListener("click", function () {
    oAudio.pause();
    oAudio.srcObject = null;
  });

Click recording end button, oAudio.pause() to stop the tab playback.
oAudio.srcObject = null; Cut off the media source so that the tag can no longer get audio data.

`2: Replay the recording`

Of course, you have to listen to the recorded audio for the effect:

  const oReplayBt = document.getElementById("replayBt");
  const oReplayAudio = document.getElementById("replayAudio");

  oReplayBt.addEventListener("click", function () {
    let blob = new Blob(buffer, { type: "audio/webm" });
    oReplayAudio.src = window.URL.createObjectURL(blob);
    oReplayAudio.play();
  });

Blob is a form of data storage. We use blob excel It can be simply understood that the first parameter is the data of the file, and the second parameter can define the type of the file.
The parameter of window.URL.createObjectURL is 'resource data', this method generates a string url , and the incoming 'resource data' can be accessed through url . It should be noted that the generated url is short-lived and cannot be accessed.
oReplayAudio.src specifies the playback address for the player. Since there is no need to record, there is no need to specify srcObject .
oReplayAudio.play(); starts playing.

`3: Re-record audio`

If the recording is not good, of course I have to re-record it. At first, I wanted to be compatible with pause and resume recording, but I feel that these capabilities are a bit off the core. It is expected that there should be very few long voice notes, so here I just swipe the page violently.

  const oResetBt = document.getElementById("resetBt");

  oResetBt.addEventListener("click", function () {
    location.reload();
  });

`Eight, conversion format`

The obtained audio file can be played directly using node , which may fail to play. Although this simple audio data stream file can be recognized by the browser, in order to eliminate the difference between different browsers and different operating systems, we need to convert it to be safe. into the standard mp3 audio format.

MP3 is a lossy music format while WAV is a lossless music format. In fact, the difference between the two is very obvious. The former sacrifices the quality of the music in exchange for a smaller file size, while the latter guarantees the quality of the music to the greatest extent possible. This also leads to different uses of the two. MP3 is generally used for our ordinary users to listen to songs, while WAV files are usually used for studio recording and professional audio projects.

Here I choose the plug-in lamejs , the github address of the plug-in is here .

lamejs is an mp3 encoder rewritten in JS, which can be simply understood as it can output the standard mp3 encoding format.

Add some initial logic to the initialization logic:

      let audioCtx = {};
      let processor;
      let source;
      let userMediStream;
      navigator.mediaDevices
        .getUserMedia({ audio: true })
        .then(function (stream) {
          userMediStream = stream;
          audio.srcObject = stream;
          audio.onloadedmetadata = function (e) {
            audio.muted = true;
          };
          audioCtx = new AudioContext(); // 新增
          source = audioCtx.createMediaStreamSource(stream); // 新增
          processor = audioCtx.createScriptProcessor(0, 1, 1); // 新增
          processor.onaudioprocess = function (e) { // 新增
            const array = e.inputBuffer.getChannelData(0);
            encode(array);
          };
        })
        .catch(function (err) {
          console.log(err);
        });

new AudioContext() The context of audio processing, the operation of audio is basically carried out in this type.
audioCtx.createMediaStreamSource(stream) Creating an audio interface is a bit abstract.
audioCtx.createScriptProcessor(0, 1, 1) An object for JavaScript to directly process audio is created here, that is, it can be used to manipulate audio data with js. The three parameters are 'buffer size', 'number of input channels', 'number of output channels' .
processor.onaudioprocess Monitor the processing method of new data.
encode processes the audio and returns an array of float32Array .

The following code refers to the code of other people on the Internet, and the specific effect is to complete the conversion of lamejs :

   let mp3Encoder,
        maxSamples = 1152,
        samplesMono,
        lame,
        config,
        dataBuffer;

      const clearBuffer = function () {
        dataBuffer = [];
      };

      const appendToBuffer = function (mp3Buf) {
        dataBuffer.push(new Int8Array(mp3Buf));
      };

      const init = function (prefConfig) {
        config = prefConfig || {};
        lame = new lamejs();
        mp3Encoder = new lame.Mp3Encoder(
          1,
          config.sampleRate || 44100,
          config.bitRate || 128
        );
        clearBuffer();
      };
      init();

      const floatTo16BitPCM = function (input, output) {
        for (let i = 0; i < input.length; i++) {
          let s = Math.max(-1, Math.min(1, input[i]));
          output[i] = s < 0 ? s * 0x8000 : s * 0x7fff;
        }
      };

      const convertBuffer = function (arrayBuffer) {
        let data = new Float32Array(arrayBuffer);
        let out = new Int16Array(arrayBuffer.length);
        floatTo16BitPCM(data, out);
        return out;
      };

      const encode = function (arrayBuffer) {
        samplesMono = convertBuffer(arrayBuffer);
        let remaining = samplesMono.length;
        for (let i = 0; remaining >= 0; i += maxSamples) {
          let left = samplesMono.subarray(i, i + maxSamples);
          let mp3buf = mp3Encoder.encodeBuffer(left);
          appendToBuffer(mp3buf);
          remaining -= maxSamples;
        }
      };

`The corresponding start recording needs to add some logic`


      oStartBt.addEventListener("click", function () {
        clearBuffer();
        oAudio.srcObject = userMediStream;
        oAudio.play();
        buffer = [];
        const options = {
          mimeType: "audio/webm",
        };
        mediaRecorder = new MediaRecorder(userMediStream, options);
        mediaRecorder.ondataavailable = handleDataAvailable;
        mediaRecorder.start(10);
        source.connect(processor); // 新增
        processor.connect(audioCtx.destination); // 新增
      });

source.connect(processor) n't panic, source is returned by createScriptProcessor mentioned above, createMediaStreamSource is returned by processor , here is to connect the two, so it is equivalent to start using js to process audio data.
audioCtx.destination The final output address of the audio graphic in a specific case, usually a speaker.
processor.connect forms a link, that is, the monitoring of processor starts.

`Add some logic to the corresponding end recording`

      oEndBt.addEventListener("click", function () {
        oAudio.pause();
        oAudio.srcObject = null;
        mediaRecorder.stop(); // 新增
        processor.disconnect(); // 新增
      });

mediaRecorder.stop Stop audio (for playback of recordings)
processor.disconnect() Stop processing audio data (after conversion to mp3).

`9. Send the recorded audio file to the server`

The finished data should be passed to the backend in the form of FormData .

      const oSubmitBt = document.getElementById("submitBt");

      oSubmitBt.addEventListener("click", function () {
        var blob = new Blob(dataBuffer, { type: "audio/mp3" });
        const formData = new FormData();
        formData.append("file", blob);
        fetch("/create_voice", {
          method: "POST",
          body: formData,
        })
          .then((res) => res.json())
          .catch((err) => console.log(err))
          .then((res) => {
            copy(res.voiceId);
            alert(`已保到剪切板: ${res.voiceId}`);
            window.opener = null;
            window.open("", "_self");
            window.close();
          });
      });

Here we close the current page after successfully passing the audio file, because there are really not many voice notes to be recorded.

`10. Future Outlook`

No similar plug-ins were found in the vscode plug-in store, and no similar plug-ins were found on github , indicating that this problem is not very painful, but it does not mean that these problems should be left alone, and take action to really do something to improve Exactly.

It is conceivable for the developer to use this "voice annotation" plug-in. It is only used when the text cannot be clearly described, so the use of the recording function should be very low frequency. Because of this, the audio file will not be used. more', so the extra volume of the project may not cause much trouble.

If you use it later, I plan to add a "one-click deletion of unused comments". As the project develops, some comments will definitely be eliminated, and manual cleaning will definitely not make sense.

When playing, it will show who made the recording and the specific time of the recording.

In addition to voice annotations, users can also add text + pictures, that is, to make a plug-in with annotations as the core.

`end`

That's it this time, hope to progress with you.

Vscode voice annotations make information richer (below)