9

translation source

Welcome to the second chapter of a series of how to use TypeScript, React, ANTLR4, Monaco Editor to create a custom web editor. Before that, I suggest you read create a custom web editor using TypeScript, React, ANTLR4, Monaco Editor ( A)

In this article, I will introduce how to implement language services. Language services are mainly used in the editor to parse the heavy work of typed text. We will use the abstract syntax tree ( AST Parser to find grammar or lexicon Errors, formatted text, the grammar can only be prompted for the user typed text (I will not implement grammatical automatic completion in this article), basically, the language service exposes the following functions:

  • format(code: string): string
  • validate(code: string): Errors[]
  • autoComplete(code: string, currentPosition: Position): string[]

Add ANTLER, Generate Lexer and Parser From the Grammar

I will introduce ANTLR library and increase according to a TODOLang.g4 raw syntax file Parser and Lexer script, you must first introduce two libraries: antlr4ts and antlr4ts-cli , antlr4 Typescript target generated parser antlr4ts dependent upon package has run, on the other hand, as the name suggests antlr4ts-cli is CLI we will use it to generate the language parser and Lexer

npm add antlr4ts
npm add -D antlr4ts-cli

Create a file TodoLangGrammar.g4 TodoLang syntax rules in the root path

grammar TodoLangGrammar;

todoExpressions : (addExpression)* (completeExpression)*;
addExpression : ADD TODO STRING;
completeExpression : COMPLETE TODO STRING;

ADD : 'ADD';
TODO : 'TODO';
COMPLETE: 'COMPLETE';
STRING: '"' ~ ["]* '"';
EOL: [\r\n] + -> skip;
WS: [ \t] -> skip;

Now we add generate Parser and Lexer antlr-cli package.json

"antlr4ts": "antlr4ts ./TodoLangGrammar.g4 -o ./src/ANTLR"

Let us execute the antlr4ts script, you can see the typescript source code of the ./src/ANTLR

npm run antlr4ts

Generated ANTLR files.png

As we have seen, there is a Lexer and Parser , if you look at Parser file, you will find it exported TodoLangGrammarParser class that has a constructor constructor(input: TokenStream) , the constructor TodoLangGrammarLexer given The code generated TokenStream as a parameter, TodoLangGrammarLexer constructor(input: CharStream) that takes the code as an input parameter

Parser file contains the public todoExpressions(): TodoExpressionsContext method, which will return all TodoExpressions context objects TodoExpressions can be traced. In fact, it is derived from the first line of grammar rules in our grammar rules file:

todoExpressions : (addExpression)* (completeExpression)*;

TodoExpressionsContext is AST . Each node in it is another context of another rule. It contains the terminal and node context. The terminal has the final token (ADD token, TODO token, todo token)

TodoExpressionsContext contains a addExpressions and completeExpressions expressions, derived from the following three rules

todoExpressions : (addExpression)* (completeExpression)*; 
addExpression : ADD TODO STRING;
completeExpression : COMPLETE TODO STRING;

grammar.png

On the other hand, each context class contains a terminal node, which basically contains the following text (code segment or token, for example: ADD, COMPLETE, string representing TODO), AST . The complexity depends on what you write Grammar rules

Let's take a look at TodoExpressionsContext , which contains ADD , TODO and STRING terminal nodes, the corresponding rules are as follows:

addExpression : ADD TODO STRING;

AddExpressionContext.png

STRING Todo text content we want to add. First, let’s parse a simple TodoLang code to understand how AST works. parser.ts ./src/language-service directory with the following content

import { TodoLangGrammarParser, TodoExpressionsContext } from "../ANTLR/TodoLangGrammarParser";
import { TodoLangGrammarLexer } from "../ANTLR/TodoLangGrammarLexer";
import { ANTLRInputStream, CommonTokenStream } from "antlr4ts";

export default function parseAndGetASTRoot(code: string): TodoExpressionsContext {
    const inputStream = new ANTLRInputStream(code);
    const lexer = new TodoLangGrammarLexer(inputStream);
    const tokenStream = new CommonTokenStream(lexer);
    const parser = new TodoLangGrammarParser(tokenStream);
    // Parse the input, where `compilationUnit` is whatever entry point you defined
    return parser.todoExpressions();
}

parser.ts file exports the parseAndGetASTRoot(code) method, which accepts the TodoLang code and generates the corresponding AST , parse the following TodoLang code:

parseAndGetASTRoot(`
ADD TODO "Create an editor"
COMPLETE TODO "Create an editor"
`)

todoExpressionContext.png

Implementing Lexical and Syntax Validation

In this section, I will guide you step by step how to add grammar verification to the editor, ANTLR generates vocabulary and grammatical errors for us out of the box, we only need to implement the ANTLRErrorListner class and provide it to Lexer Lexer 160c8570e1ff2fd and Parser , so that we can collect errors when ANTLR

Create a TodoLangErrorListener.ts file in the ./src/language-service directory, and export the file to implement the TodoLangErrorListener class of the ANTLRErrorListner

import { ANTLRErrorListener, RecognitionException, Recognizer } from "antlr4ts";

export interface ITodoLangError {
    startLineNumber: number;
    startColumn: number;
    endLineNumber: number;
    endColumn: number;
    message: string;
    code: string;
}

export default class TodoLangErrorListener implements ANTLRErrorListener<any>{
    private errors: ITodoLangError[] = []
    syntaxError(recognizer: Recognizer<any, any>, offendingSymbol: any, line: number, charPositionInLine: number, message: string, e: RecognitionException | undefined): void {
        
        this.errors.push(
            {
                startLineNumber:line,
                endLineNumber: line,
                startColumn: charPositionInLine,
                endColumn: charPositionInLine+1,//Let's suppose the length of the error is only 1 char for simplicity
                message,
                code: "1" // This the error code you can customize them as you want
            }
        )
    }

    getErrors(): ITodoLangError[] {
        return this.errors;
    }
}

Every time ANTLR encounters an error during code parsing, it will call this TodoLangErrorListener to provide it with information about the error. The listener will return an error message containing the code location where the parsing error occurred. Now we try to bind TodoLangErrorListener To parser.ts Lexer and Parser in the file of , eg:

import { TodoLangGrammarParser, TodoExpressionsContext } from "../ANTLR/TodoLangGrammarParser";
import { TodoLangGrammarLexer } from "../ANTLR/TodoLangGrammarLexer";
import { ANTLRInputStream, CommonTokenStream } from "antlr4ts";
import TodoLangErrorListener, { ITodoLangError } from "./TodoLangErrorListener";

function parse(code: string): {ast:TodoExpressionsContext, errors: ITodoLangError[]} {
    const inputStream = new ANTLRInputStream(code);
    const lexer = new TodoLangGrammarLexer(inputStream);
    lexer.removeErrorListeners()
    const todoLangErrorsListner = new TodoLangErrorListener();
    lexer.addErrorListener(todoLangErrorsListner);
    const tokenStream = new CommonTokenStream(lexer);
    const parser = new TodoLangGrammarParser(tokenStream);
    parser.removeErrorListeners();
    parser.addErrorListener(todoLangErrorsListner);
    const ast =  parser.todoExpressions();
    const errors: ITodoLangError[]  = todoLangErrorsListner.getErrors();
    return {ast, errors};
}
export function parseAndGetASTRoot(code: string): TodoExpressionsContext {
    const {ast} = parse(code);
    return ast;
}
export function parseAndGetSyntaxErrors(code: string): ITodoLangError[] {
    const {errors} = parse(code);
    return errors;
}

Create LanguageService.ts in the ./src/language-service directory, the following is the content it exports


import { TodoExpressionsContext } from "../ANTLR/TodoLangGrammarParser";
import { parseAndGetASTRoot, parseAndGetSyntaxErrors } from "./Parser";
import { ITodoLangError } from "./TodoLangErrorListener";

export default class TodoLangLanguageService {
    validate(code: string): ITodoLangError[] {
        const syntaxErrors: ITodoLangError[] = parseAndGetSyntaxErrors(code);
        //Later we will append semantic errors
        return syntaxErrors;
    }
}

Yes, we have implemented editor error analysis. For this reason, I will create discussed in the web worker article of 160c8570e1f478, and add worker service agent, which will call the language service area to complete the advanced functions of the editor

Creating the web worker

First, we call monaco.editor.createWebWorker to use the built-in ES6 Proxies create a proxy TodoLangWorker , TodoLangWorker uses language service to perform editing functions, in web worker those executed by monaco agent, so web worker Invoking a method only calls the delegated method in the main thread.

TodoLangWorker.ts under the ./src/todo-lang folder with the following content:

import * as monaco from "monaco-editor-core";
import IWorkerContext = monaco.worker.IWorkerContext;
import TodoLangLanguageService from "../language-service/LanguageService";
import { ITodoLangError } from "../language-service/TodoLangErrorListener";

export class TodoLangWorker {
    private _ctx: IWorkerContext;
    private languageService: TodoLangLanguageService;
    constructor(ctx: IWorkerContext) {
        this._ctx = ctx;
        this.languageService = new TodoLangLanguageService();
    }

    doValidation(): Promise<ITodoLangError[]> {
        const code = this.getTextDocument();
        return Promise.resolve(this.languageService.validate(code));
    }
  
    private getTextDocument(): string {
        const model = this._ctx.getMirrorModels()[0];
        return model.getValue();
    }

We created language service instance and adds doValidation ways to further it calls language service of validate method, also added getTextDocument method used to obtain the value of the text editor, TodoLangWorker class can be extended if you want a lot of features to support multi File editing, etc., _ctx: IWorkerContext is the context object of the editor, which saves the model information of the file

Now let us create a web worker file todolang.worker.ts ./src/todo-lang

import * as worker from 'monaco-editor-core/esm/vs/editor/editor.worker';
import { TodoLangWorker } from './todoLangWorker';

self.onmessage = () => {
    worker.initialize((ctx) => {
        return new TodoLangWorker(ctx)
    });
};

We use the built-in worker.initialize initialize our workers, and use TodoLangWorker for the necessary method proxy

That is a web worker , so we must let webpack output the corresponding worker file

// webpack.config.js
entry: {
        app: './src/index.tsx',
        "editor.worker": 'monaco-editor-core/esm/vs/editor/editor.worker.js',
        "todoLangWorker": './src/todo-lang/todolang.worker.ts'
    },
    output: {
        globalObject: 'self',
        filename: (chunkData) => {
            switch (chunkData.chunk.name) {
                case 'editor.worker':
                    return 'editor.worker.js';
                case 'todoLangWorker':
                    return "todoLangWorker.js"
                default:
                    return 'bundle.[hash].js';
            }
        },
        path: path.resolve(__dirname, 'dist')
    }

We named the worker file as the todoLangWorker.js file, and now we add getWorkUrl

 (window as any).MonacoEnvironment = {
        getWorkerUrl: function (moduleId, label) {
            if (label === languageID)
                return "./todoLangWorker.js";
            return './editor.worker.js';
        }
    }

This is how monaco get web worker the URL of the way, please note that if worker label is TodoLang of ID, we will return for packaging output in Webpack the same name worker, if we build the project, you may find a file called todoLangWorker.js (Or in dev-tools, you will find two worker in the thread section)

Now create one for management worker create and access proxy worker client WorkerManager

import * as monaco from "monaco-editor-core";

import Uri = monaco.Uri;
import { TodoLangWorker } from './todoLangWorker';
import { languageID } from './config';

export class WorkerManager {

    private worker: monaco.editor.MonacoWebWorker<TodoLangWorker>;
    private workerClientProxy: Promise<TodoLangWorker>;

    constructor() {
        this.worker = null;
    }

    private getClientproxy(): Promise<TodoLangWorker> {
        if (!this.workerClientProxy) {
            this.worker = monaco.editor.createWebWorker<TodoLangWorker>({
                moduleId: 'TodoLangWorker',
                label: languageID,
                createData: {
                    languageId: languageID,
                }
            });
            this.workerClientProxy = <Promise<TodoLangWorker>><any>this.worker.getProxy();
        }

        return this.workerClientProxy;
    }

    async getLanguageServiceWorker(...resources: Uri[]): Promise<TodoLangWorker> {
        const _client: TodoLangWorker = await this.getClientproxy();
        await this.worker.withSyncedResources(resources)
        return _client;
    }
}

We use createWebWorker create monaco proxy web worker , and then we get the client object that returns the proxy, we use workerClientProxy call some methods of the proxy, let us create the DiagnosticsAdapter class, which is used to connect the API Monaco and the language service mark error, in order to make the parsing error correct mark on monaco

import * as monaco from "monaco-editor-core";
import { WorkerAccessor } from "./setup";
import { languageID } from "./config";
import { ITodoLangError } from "../language-service/TodoLangErrorListener";

export default class DiagnosticsAdapter {
    constructor(private worker: WorkerAccessor) {
        const onModelAdd = (model: monaco.editor.IModel): void => {
            let handle: any;
            model.onDidChangeContent(() => {
                // here we are Debouncing the user changes, so everytime a new change is done, we wait 500ms before validating
                // otherwise if the user is still typing, we cancel the
                clearTimeout(handle);
                handle = setTimeout(() => this.validate(model.uri), 500);
            });

            this.validate(model.uri);
        };
        monaco.editor.onDidCreateModel(onModelAdd);
        monaco.editor.getModels().forEach(onModelAdd);
    }
    private async validate(resource: monaco.Uri): Promise<void> {
        const worker = await this.worker(resource)
        const errorMarkers = await worker.doValidation();
        const model = monaco.editor.getModel(resource);
        monaco.editor.setModelMarkers(model, languageID, errorMarkers.map(toDiagnostics));
    }
}
function toDiagnostics(error: ITodoLangError): monaco.editor.IMarkerData {
    return {
        ...error,
        severity: monaco.MarkerSeverity.Error,
    };
}

onDidChangeContent listener listens to the model information. If the model information changes, we will call webworker every 500ms to verify the code and add the error flag; setModelMarkers informs monaco add the error flag to the 81c8, in order to make the editor complete the setup Call them in, and notice that we are using WorkerManager to get the proxy worker

monaco.languages.onLanguage(languageID, () => {
        monaco.languages.setMonarchTokensProvider(languageID, monarchLanguage);
        monaco.languages.setLanguageConfiguration(languageID, richLanguageConfiguration);
        const client = new WorkerManager();
        const worker: WorkerAccessor = (...uris: monaco.Uri[]): Promise<TodoLangWorker> => {
            return client.getLanguageServiceWorker(...uris);
        };
        //Call the errors provider
        new DiagnosticsAdapter(worker);
    });
}

export type WorkerAccessor = (...uris: monaco.Uri[]) => Promise<TodoLangWorker>;

Now everything is ready, run the project and enter the wrong TodoLang code, you will find that the error is marked below the code
error.png

Implementing Semantic Validation

Now add semantic verification to the editor, remember the two semantic rules I mentioned in the previous article

  • If TODO is defined using the ADD TODO description, we can add it again.
  • In TODO application, the COMPLETE instruction should not be used before ADD TODO is declared

To check whether TODO is defined, all we have to do is to traverse the AST to get each ADD expression and push it into definedTodos . Then we definedTodos . If it exists, it is a semantic error, so please download from ADD Get the position of the error in the context of the expression, and then push the error to the array, as is the second rule

function checkSemanticRules(ast: TodoExpressionsContext): ITodoLangError[] {
    const errors: ITodoLangError[] = [];
    const definedTodos: string[] = [];
    ast.children.forEach(node => {
        if (node instanceof AddExpressionContext) {
            // if a Add expression : ADD TODO "STRING"
            const todo = node.STRING().text;
            // If a TODO is defined using ADD TODO instruction, we can re-add it.
            if (definedTodos.some(todo_ => todo_ === todo)) {
                // node has everything to know the position of this expression is in the code
                errors.push({
                    code: "2",
                    endColumn: node.stop.charPositionInLine + node.stop.stopIndex - node.stop.stopIndex,
                    endLineNumber: node.stop.line,
                    message: `Todo ${todo} already defined`,
                    startColumn: node.stop.charPositionInLine,
                    startLineNumber: node.stop.line
                });
            } else {
                definedTodos.push(todo);
            }
        }else if(node instanceof CompleteExpressionContext) {
            const todoToComplete = node.STRING().text;
            if(definedTodos.every(todo_ => todo_ !== todoToComplete)){
                // if the the todo is not yet defined, here we are only checking the predefined todo until this expression
                // which means the order is important
                errors.push({
                    code: "2",
                    endColumn: node.stop.charPositionInLine + node.stop.stopIndex - node.stop.stopIndex,
                    endLineNumber: node.stop.line,
                    message: `Todo ${todoToComplete} is not defined`,
                    startColumn: node.stop.charPositionInLine,
                    startLineNumber: node.stop.line
                });
            }
        }

    })
    return errors;
}

Call now checkSemanticRules function in language service of validate semantics and syntax errors consolidation method will return, and now we've editor supports semantic check

semanticError.png

Implementing Auto-Formatting

For the editor's automatic formatting function, you need to Monaco API registerDocumentFormattingEditProvider . Check the monaco-editor documentation for more details. Calling and traversing the AST will show you the beautified code

// languageService.ts   
format(code: string): string{
        // if the code contains errors, no need to format, because this way of formating the code, will remove some of the code
        // to make things simple, we only allow formatting a valide code
        if(this.validate(code).length > 0)
            return code;
        let formattedCode = "";
        const ast: TodoExpressionsContext = parseAndGetASTRoot(code);
        ast.children.forEach(node => {
            if (node instanceof AddExpressionContext) {
                // if a Add expression : ADD TODO "STRING"
                const todo = node.STRING().text;
                formattedCode += `ADD TODO ${todo}\n`;
            }else if(node instanceof CompleteExpressionContext) {
                // If a Complete expression: COMPLETE TODO "STRING"
                const todoToComplete = node.STRING().text;
                formattedCode += `COMPLETE TODO ${todoToComplete}\n`;
            }
        });
        return formattedCode;
    }

In todoLangWorker added in format method, the format method uses language service of format method

Now create the TodoLangFomattingProvider class to implement the `DocumentFormattingEditProvider interface

import * as monaco from "monaco-editor-core";
import { WorkerAccessor } from "./setup";

export default class TodoLangFormattingProvider implements monaco.languages.DocumentFormattingEditProvider {

    constructor(private worker: WorkerAccessor) {

    }

    provideDocumentFormattingEdits(model: monaco.editor.ITextModel, options: monaco.languages.FormattingOptions, token: monaco.CancellationToken): monaco.languages.ProviderResult<monaco.languages.TextEdit[]> {
        return this.format(model.uri, model.getValue());
    }

    private async format(resource: monaco.Uri, code: string): Promise<monaco.languages.TextEdit[]> {
        // get the worker proxy
        const worker = await this.worker(resource)
        // call the validate methode proxy from the langaueg service and get errors
        const formattedCode = await worker.format(code);
        const endLineNumber = code.split("\n").length + 1;
        const endColumn = code.split("\n").map(line => line.length).sort((a, b) => a - b)[0] + 1;
        console.log({ endColumn, endLineNumber, formattedCode, code })
        return [
            {
                text: formattedCode,
                range: {
                    endColumn,
                    endLineNumber,
                    startColumn: 0,
                    startLineNumber: 0
                }
            }
        ]
    }
}

TodoLangFormattingProvider by calling worker provided format methods and means editor.getValue() as the reference, and to Monaco provide a variety of code and the code range desired to replace, now enters setup function and use Monaco registerDocumentFormattingEditProvider the API register formatting provider , re-run Application, you can see that the editor already supports automatic formatting

monaco.languages.registerDocumentFormattingEditProvider(languageID, new TodoLangFormattingProvider(worker));

formatter.png

Try to click Format document or Shift + Alt + F , you can see the effect as shown:

format1.png

Implementing Auto-Completion

To support the auto-complete the definition of TODO, you have to do is to get all defined TODO from AST, and provide completion provider by setup call registerCompletionItemProvider . completion provider gives you the code and the current position of the cursor, so you can check the context in which the user is typing, and if they type TODO in a complete expression, you can suggest predefined TO DOs. Remember, by default, Monaco-editor supports automatic completion of predefined tags in the code, you may need to disable this feature and implement your own tags to make it more intelligent and contextual

Translator information

mumiao.png


袋鼠云数栈UED
277 声望33 粉丝

我们是袋鼠云数栈 UED 团队,致力于打造优秀的一站式数据中台产品。我们始终保持工匠精神,探索前端道路,为社区积累并传播经验价值。