1
头图

Original: https://lwebapp.com/zh/post/regular-expression-to-match-multiple-lines-of-text

need

Recently, a small partner put forward a request. I want to use regular expressions to extract the specific updated code from a git submission record. Simply put, it is the code displayed by commit diff. The code needs to be preceded by + and the line for - is stripped out.

We copied a commit record from the RichX project and modified it slightly for demonstration.

 + import { Plugin } from "..";
- CONST SUM = NUM_A + NUM_B;
+ CONST SUM_ALL = NUM_A + NUM_B;

  export const DEFAULT_RICH_TEXT = {
-   text: "Simple Rich Text Demo",
+   config: "Simple Rich Text Demo",
    setting: [],
  };

  export type ObjectKV<V = object> = {
    [key: string]: V;
  };

+ export interface IPlugins {
+   [key: string]: Plugin;
+ }

We convert the requirements, that is, to regularly match the lines starting with + and - in the multi-line text.

Solution one

Ideas:

  1. First match the characters starting with + 718366ff4df45c574d21e92f4f6c50f5---: \+.*
  2. Then bring - : (\+|\-).*
  3. Because the multi-line text is separated by a newline, the previous character of the single-line text at the beginning of + is the last newline of the previous line \n , the same, this line is also a newline at the end. So we use regular assertion to match the two newlines to the beginning and end of the target text: (?<=\n)(\+|\-).*(?=\n)
  4. Finally, there are two special cases to consider, the position of the beginning and end of the entire text. There is no previous line at the first position, so the newline character cannot be matched \n , only the beginning ^ , and there may be no newline after the end, use $ instead (?<=^|\n)(\+|\-).*(?=\n|$)

Code:

 const content = `+ import { Plugin } from "..";
- CONST SUM = NUM_A + NUM_B;
+ CONST SUM_ALL = NUM_A + NUM_B;

  export const DEFAULT_RICH_TEXT = {
-   text: "Simple Rich Text Demo",
+   config: "Simple Rich Text Demo",
    setting: [],
  };

  export type ObjectKV<V = object> = {
    [key: string]: V;
  };

+ export interface IPlugins {
+   [key: string]: Plugin;
+ }`

content.match(/(?<=^|\n)(\+|\-).*(?=\n|$)/g)

// 输出数组
// 0: "+ import { Plugin } from \"..\";"
// 1: "- CONST SUM = NUM_A + NUM_B;"
// 2: "+ CONST SUM_ALL = NUM_A + NUM_B;"
// 3: "-   text: \"Simple Rich Text Demo\","
// 4: "+   config: \"Simple Rich Text Demo\","
// 5: "+ export interface IPlugins {"
// 6: "+   [key: string]: Plugin;"
// 7: "+ }"

Solution two

Ideas:

The above scheme is a bit troublesome to match newlines by itself. We can omit the step of judging the newline by ourselves, directly match the beginning and end of each line, and then use the regular expression flag m to enable the multi-line matching mode: /^(\+|\-).*$/gm .

Code:

 const content = `+ import { Plugin } from "..";
- CONST SUM = NUM_A + NUM_B;
+ CONST SUM_ALL = NUM_A + NUM_B;

  export const DEFAULT_RICH_TEXT = {
-   text: "Simple Rich Text Demo",
+   config: "Simple Rich Text Demo",
    setting: [],
  };

  export type ObjectKV<V = object> = {
    [key: string]: V;
  };

+ export interface IPlugins {
+   [key: string]: Plugin;
+ }`

content.match(/^(\+|\-).*$/gm)

// 输出数组
// 0: "+ import { Plugin } from \"..\";"
// 1: "- CONST SUM = NUM_A + NUM_B;"
// 2: "+ CONST SUM_ALL = NUM_A + NUM_B;"
// 3: "-   text: \"Simple Rich Text Demo\","
// 4: "+   config: \"Simple Rich Text Demo\","
// 5: "+ export interface IPlugins {"
// 6: "+   [key: string]: Plugin;"
// 7: "+ }"

Summarize

The above is a little experience of writing regular expressions discussed with my friends, mainly learning assertions and multi-line matching flags. The case here is relatively simple, and there will be more in-depth use cases to share with you in the future. Welcome to follow our update #regex .

refer to


Dushusir
125 声望4 粉丝

前端开发