How much do you know about Java regular expressions?

Preface

Regular expressions are generally used for string matching, string search and string replacement. Don't underestimate its role. Flexible use of regular expressions to process strings in work and study can greatly improve efficiency. The joy of programming is that simple.

The following will explain the use of regular expressions from the simpler to the deeper.

Simple Getting Started `.`

package test;

public class Test01 {

    public static void main(String[] args) {
        //字符串abc匹配正则表达式"...", 其中"."表示一个字符
        //"..."表示三个字符
        System.out.println("abc".matches("..."));

        System.out.println("abcd".matches("..."));
    }

}

Output result:

true
false

There is a matches(String regex) String class. The return value is a boolean type, which is used to tell whether the string matches a given regular expression.

In this example, the regular expression we give is ... , where each . represents one character, and the entire regular expression means three characters. Obviously, when it matches abc , the result is true , abcd , the result is false .

Support for regular expressions in Java

There are two classes for regular expressions under the java.util.regex Matcher and the other is Pattern .

The typical usage of these two classes is given in the official Java documentation. The code is as follows:

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test02 {

    public static void main(String[] args) {
        //[a-z]表示a~z之间的任何一个字符, {3}表示3个字符, 意思是匹配一个长度为3, 并且每个字符属于a~z的字符串
        Pattern p = Pattern.compile("[a-z]{3}");
        Matcher m1 = p.matcher("abc");
        System.out.println(m2.matches());
    }
}

Output result: true

Pattern can be understood as a pattern, and the string needs to be matched with a certain pattern. For example, in Test02 , the mode we defined is a string of length 3, in which each character must be one of a~z.

We see that when the Pattern object is created, the compile Pattern class is called, which means that the regular expression we pass in is compiled to get a pattern object. And this compiled pattern object will greatly improve the efficiency of regular expression usage, and as a constant, it can be safely used by multiple threads concurrently.

Matcher can be understood as the result of a pattern matching a string. After a string matches a certain pattern, there may be many results, which will be explained in the following example.

Finally, when we call m.matches() , it will return the result that the complete string matches the pattern.

The above three lines of code can be reduced to one line of code:

System.out.println("abc".matches("[a-z]{3}"));

But if a regular expression needs to be matched repeatedly, this writing method is less efficient.

`Number of matches symbol`

symbol	frequency
*	0 or more times
+	1 time or more
？	0 times or 1 time
{n}	Exactly n times
{n,m}	Appears n~m times
{n,}	At least n times

Code example:

package test;

public class Test03 {

    private static void p(Object o){
        System.out.println(o);
    }
    
    public static void main(String[] args) {
        // "X*" 代表零个或多个X
        p("aaaa".matches("a*"));
        p("".matches("a*"));
        // "X+" 代表一个或多个X
        p("aaaa".matches("a+"));
        // "X?" 代表零个或一个X
        p("a".matches("a?"));
        // \\d    A digit: [0-9], 表示数字, 但是在java中对"\\"这个符号需要使用\\进行转义, 所以出现\\d
        p("2345".matches("\\d{2,5}"));
        // \\.用于匹配"."
        p("192.168.0.123".matches("\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}"));
        // [0-2]指必须是0~2中的一个数字
        p("192".matches("[0-2][0-9][0-9]"));
    }
}

Output result: all true .

`Range []`

[] used to describe the range of a character. Here are some examples:

package test;

public class Test04 {

    private static void p(Object o){
        System.out.println(o);
    }

    public static void main(String[] args) {
        //[abc]指abc中的其中一个字母
        p("a".matches("[abc]"));
        //[^abc]指除了abc之外的字符
        p("1".matches("[^abc]"));
        //a~z或A~Z的字符, 以下三个均是或的写法
        p("A".matches("[a-zA-Z]"));
        p("A".matches("[a-z|A-Z]"));
        p("A".matches("[a-z[A-Z]]"));
        //[A-Z&&[REQ]]指A~Z中并且属于REQ其中之一的字符
        p("R".matches("[A-Z&&[REQ]]"));
    }
}

Output result: all true .

`\s \w \d \S \W \D`

About \

In the string in Java, if special characters are to be used, they must be escaped \

For example, consider the string "The teacher said loudly: "Classmates, hand in homework!"". If we do not have an escape character, then the end of the double quotation mark at the beginning should be saying: "Here, but we need to use double quotation marks in our string, so we need to use escape characters.

The string after using the escape character is "The teacher said loudly: \"Classmates, hand in homework!\"", so that our original intention can be correctly recognized.

Similarly, if we want to use \ in the string, we should also add a \ in front, so it is represented as "\\" in the string.

So how to express to match \ in the regular expression? The answer is "\\\\" .

We considered separately: As the regular expression indicated \ also need to escape, so the front of \\ represent the escape character regular expression \ , behind \\ a regular expression \ itself, are collectively referred to in the regular expression \ .

Let's look at the code example first:

package test;

public class Test05 {

    private static void p(Object o){
        System.out.println(o);
    }

    public static void main(String[] args) {
        // \s{4}表示4个空白符
        p(" \n\r\t".matches("\\s{4}"));
        // \S表示非空白符
        p("a".matches("\\S"));
        // \w{3}表示数字字母和下划线
        p("a_8".matches("\\w{3}"));
        p("abc888&^%".matches("[a-z]{1,3}\\d+[%^&*]+"));
        // 匹配 \
        p("\\".matches("\\\\"));
    }
}

symbol	Express
\d	[0-9] Number
\D	¹ Not a number
\s	[\t\n\r\f] space
\S	² non-space
\w	[0—9A—Z_a—z] Numbers and letters and underscore
\W	³ Non-numeric letters and underscores

`Border processing ^`

^ in square brackets means the [^] , if not in square brackets, it means the beginning of the string.

Code example:

package test;

public class Test06 {

    private static void p(Object o){
        System.out.println(o);
    }

    public static void main(String[] args) {
        /**
         * ^    The beginning of a line 一个字符串的开始
         * $    The end of a line       字符串的结束
         * \b    A word boundary         一个单词的边界, 可以是空格, 换行符等
         */
        p("hello sir".matches("^h.*"));
        p("hello sir".matches(".*r$"));
        p("hello sir".matches("^h[a-z]{1,3}o\\b.*"));
        p("hellosir".matches("^h[a-z]{1,3}o\\b.*"));
    }
}

Output result:

true
true
true
false

`Matcher class`

matches() method will match the entire string with the template.
find() starts the match from the current position. If find() , then the current position is the beginning of the string. For specific analysis of the current position, you can see the following code example
lookingAt() method will match from the beginning of the string.

Code example:

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test07 {
    private static void p(Object o){
        System.out.println(o);
    }

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("\\d{3,5}");
        String s = "123-34345-234-00";
        Matcher m = pattern.matcher(s);

        //先演示matches(), 与整个字符串匹配.
        p(m.matches());
        //结果为false, 显然要匹配3~5个数字会在-处匹配失败

        //然后演示find(), 先使用reset()方法把当前位置设置为字符串的开头
        m.reset();
        p(m.find());//true 匹配123成功
        p(m.find());//true 匹配34345成功
        p(m.find());//true 匹配234成功
        p(m.find());//false 匹配00失败

        //下面我们演示不在matches()使用reset(), 看看当前位置的变化
        m.reset();//先重置
        p(m.matches());//false 匹配整个字符串失败, 当前位置来到-
        p(m.find());// true 匹配34345成功
        p(m.find());// true 匹配234成功
        p(m.find());// false 匹配00始边
        p(m.find());// false 没有东西匹配, 失败

        //演示lookingAt(), 从头开始找
        p(m.lookingAt());//true 找到123, 成功
    }
    
}

If a match is successful, start() used to return to the starting position of the match,
end() used to return a position after the end of the match.

Code example:

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test08 {

    private static void p(Object o) {
        System.out.println(o);
    }

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("\\d{3,5}");
        String s = "123-34345-234-00";
        Matcher m = pattern.matcher(s);

        p(m.find());//true 匹配123成功
        p("start: " + m.start() + " - end:" + m.end());
        p(m.find());//true 匹配34345成功
        p("start: " + m.start() + " - end:" + m.end());
        p(m.find());//true 匹配234成功
        p("start: " + m.start() + " - end:" + m.end());
        p(m.find());//false 匹配00失败
        try {
            p("start: " + m.start() + " - end:" + m.end());
        } catch (Exception e) {
            System.out.println("报错了...");
        }
        p(m.lookingAt());
        p("start: " + m.start() + " - end:" + m.end());
    }
}

Output result:

true
start: 0 - end:3
true
start: 4 - end:9
true
start: 10 - end:13
false
报错了...
true
start: 0 - end:3

`Replacement string`

A method group() Matcher class, which can return the matched string.

Code example: Convert java in the string to uppercase

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test09 {

    private static void p(Object o){
        System.out.println(o);
    }

    public static void main(String[] args) {
        Pattern p = Pattern.compile("java");
        Matcher m = p.matcher("java I love Java and you");
        p(m.replaceAll("JAVA"));//replaceAll()方法会替换所有匹配到的字符串
    }
}

Output result:

JAVA I love Java and you

`Case-insensitive search and replace string`

We need to specify case insensitivity when creating the template template.

public static void main(String[] args) {
    Pattern p = Pattern.compile("java", Pattern.CASE_INSENSITIVE);//指定为大小写不敏感的
    Matcher m = p.matcher("java I love Java and you");
    p(m.replaceAll("JAVA"));
}

Output result:

JAVA I love JAVA and you

`Not case sensitive, replace the specified string found`

Here is a demonstration of converting the odd-numbered string to uppercase and the even-numbered string to lowercase.

Here will introduce a powerful method appendReplacement(StringBuffer sb, String replacement) Matcher class, it needs to pass in a StringBuffer for string splicing.

public static void main(String[] args) {
    Pattern p = Pattern.compile("java", Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher("java Java JAVA JAva I love Java and you ?");
    StringBuffer sb = new StringBuffer();
    int index = 1;
    while(m.find()){
        m.appendReplacement(sb, (index++ & 1) == 0 ? "java" : "JAVA");
        index++;
    }
    m.appendTail(sb);//把剩余的字符串加入
    p(sb);
}

Output result:

JAVA JAVA JAVA JAVA I love JAVA and you ?

`Grouping`

Let's look at an example:

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test10 {

    private static void p(Object o) {
        System.out.println(o);
    }

    public static void main(String[] args) {
        Pattern p = Pattern.compile("\\d{3,5}[a-z]{2}");
        String s = "005aa-856zx-1425kj-29";
        Matcher m = p.matcher(s);
        while (m.find()) {
            p(m.group());
        }
    }
}

Output result:

005aa
856zx
1425kj

Among them, the regular expression "\\d{3,5}[a-z]{2}" means 3~5 numbers followed by two letters, and then print out each matched string.

What if you want to print the numbers in each matching string?

The grouping mechanism can help us to group in regular expressions. It is stipulated to use () for grouping. Here we divide the letters and numbers into a group of "(\\d{3,5})([a-z]{2})"

Then pass in the group number when calling the m.group(int group)

Note: The group number starts from 0, and the 0 group represents the entire regular expression. After 0, each left parenthesis corresponds to a group from left to right in the regular expression. In this expression, the first group is numbers and the second group is letters.

public static void main(String[] args) {
    Pattern p = Pattern.compile("(\\d{3,5})([a-z]{2})");//正则表达式为3~5个数字跟上两个字母
    String s = "005aa-856zx-1425kj-29";
    Matcher m = p.matcher(s);
    while(m.find()){
        p(m.group(1));
    }
}

Output result:

005
856
1425

`tidy`

Regular expression matching Chinese characters: [\u4e00-\u9fa5]
Match double-byte characters (including Chinese characters): [^\x00-\xff]
Regular expression that matches blank lines: \n[\s| ]*\r
Regular expression matching HTML tags: /<(.*)>.*<\/\1>|<(.*) \/>/
Regular expression that matches leading and trailing spaces: (^\s*)|(\s*$)
Regular expression matching IP address: /(\d+)\.(\d+)\.(\d+)\.(\d+)/g //
Regular expression matching Email address: \w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*
Regular expression to match the URL: http://(/[\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?
sql statement: ^(select|drop|delete|create|update|insert).*$
Non-negative integer: ^\d+$
Positive integer: ^[0-9]*[1-9][0-9]*$
Non-positive integer: ^((-\d+)|(0+))$
Negative integer: ^-[0-9]*[1-9][0-9]*$
Integer: ^-?\d+$
Non-negative floating point number: ^\d+(\.\d+)?$
Positive floating point number: ^((0-9)+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]*))$
Non-positive floating point number: ^((-\d+\.\d+)?)|(0+(\.0+)?))$
Negative floating point number: ^(-((positive floating point number regular expression)))$
English string: ^[A-Za-z]+$
English capital string: ^[A-Z]+$
English lowercase string: ^[a-z]+$
English character number string: ^[A-Za-z0-9]+$
Alphanumeric and underscore string: ^\w+$
E-mail address: ^[\w-]+(\.[\w-]+)*@[\w-]+(\.[\w-]+)+$
URL: ^[a-zA-Z]+://(\w+(-\w+)*)(\.(\w+(-\w+)*))*(\?\s*)?$ or ：^http:\/\/[A-Za-z0-9]+\.[A-Za-z0-9]+[\/=\?%\-&_~@[\]\':+! ]*([^<>\"\"])*$
Postal Code: ^[1-9]\d{5}$
Chinese: ^[\u0391-\uFFE5]+$
Phone number: ^(($\d{2,3}$)|(\d{3}\-))?($0\d{2,3}$|0\d{2,3}-)?[1-9]\d{6,7}(\-\d{1,4})?$
Mobile number: ^(($\d{2,3}$)|(\d{3}\-))?13\d{9}$
Double-byte characters (including Chinese characters): ^\x00-\xff
Match leading and trailing spaces: (^\s*)|(\s*$) (trim function like vbscript)
Match HTML tags: <(.*)>.*<\/\1>|<(.*) \/>
Match blank lines: \n[\s| ]*\r
Extract the network link in the information: (h|H)(r|R)(e|E)(f|F) *= *('|")?(\w|\\|\/|\.)+('|"| *|>)?
Extract the email address in the message: \w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*
Extract the picture link in the information: (s|S)(r|R)(c|C) *= *('|")?(\w|\\|\/|\.)+('|"| *|>)?
Extract the IP address in the information: (\d+)\.(\d+)\.(\d+)\.(\d+)
Extract the Chinese mobile phone number in the message: 86)*0*13\d{9}
The Chinese fixed telephone number in the extracted information: ($\d{3,4}$|\d{3,4}-|\s)?\d{8}
Extract the Chinese phone number in the message (including mobile and fixed phones): ($\d{3,4}$|\d{3,4}-|\s)?\d{7,14}
Chinese postal code in the extracted information: [1-9]{1}(\d+){5}
Extract the floating-point number (ie decimal) in the information: (-?\d*)\.?\d+
Extract any number in the message: (-?\d*)(\.\d+)?
IP address: (\d+)\.(\d+)\.(\d+)\.(\d+)
Telephone area code: /^0\d{2,3}$/
Tencent QQ number: ^[1-9]*[1-9][0-9]*$
Account number (beginning with a letter, allowing 5-16 bytes, allowing alphanumeric underscores): ^[a-zA-Z][a-zA-Z0-9_]{4,15}$
Chinese, English, numbers and underscore: ^[\u4e00-\u9fa5_a-zA-Z0-9]+$

`Summarize`

The above is a summary and instructions for regular expressions. May regular expressions bring you a more pleasant programming experience .

`end`

I am a code farmer who is being beaten and working hard to advance. If the article is helpful to you, remember to and follow 16125c059e876d, thank you!

0—9 ↩
\t\n\r\f ↩
0—9A—Z_a—z ↩

How much do you know about Java regular expressions?

Preface

Simple Getting Started `.`

Support for regular expressions in Java

`Number of matches symbol`

`Range []`

`\s \w \d \S \W \D`

`Border processing ^`

`Matcher class`

`Replacement string`

`Case-insensitive search and replace string`

`Not case sensitive, replace the specified string found`

`Grouping`

`tidy`

`Summarize`

`end`

初念初恋

`引用和评论`

你知道 MySQL update 语句背后藏着哪些不可告人的秘密？

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性

How much do you know about Java regular expressions?

Preface

Simple Getting Started .

Support for regular expressions in Java

Number of matches symbol

Range []

\s \w \d \S \W \D

Border processing ^

Matcher class

Replacement string

Case-insensitive search and replace string

Not case sensitive, replace the specified string found

Grouping

tidy

Summarize

end

初念初恋

引用和评论

你知道 MySQL update 语句背后藏着哪些不可告人的秘密？

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性

Simple Getting Started `.`

`Number of matches symbol`

`Range []`

`\s \w \d \S \W \D`

`Border processing ^`

`Matcher class`

`Replacement string`

`Case-insensitive search and replace string`

`Not case sensitive, replace the specified string found`

`Grouping`

`tidy`

`Summarize`

`end`

`引用和评论`