1

I. Introduction

Because Groovy was used in the project to expand business capabilities, the effect is better, so simply record and share, here you can learn:

  • Why choose Groovy as a scripting engine
  • Understand the basic principles of Groovy and how Java integrates Groovy
  • Security and performance optimization when using the script engine in the project
  • Some suggestions for practical use

2. Why use scripting language

2.1 Problems that can be solved by scripting language

With the rapid development of business in the Internet age, not only product iterations and updates are getting faster and faster, but there are also more and more personalized requirements, such as multi-dimensional (condition) queries, business flow rules, etc. The method usually has the following aspects:

  • The most common way is to use code to enumerate all situations, that is, all query dimensions, all possible rule combinations, and traverse search according to runtime parameters;
  • Use open source solutions, such as drools rule engine, which is suitable for business based on rule flow and more complex systems;
  • Use dynamic scripting engine, such as Groovy, JSR223. Note: JSR stands for Java Specification Request, which is a formal request for adding a standardized technical specification to JCP (Java Community Process). Anyone can submit JST to add new APIs and services to the Java platform. JSR is an important standard in the Java world. JSR223 provides a convenient and standard way to execute scripting languages from within Java, and provides the function of accessing Java resources and classes from within scripts, that is, it provides a unified interface and unified access mode for each script engine. JSR223 not only has built-in support for Groovy, Javascript, and Aviator, but also provides SPI extensions. The author has implemented a Java script engine through SPI extensions to "script" running Java code.

The introduction of a dynamic script engine to abstract the business can meet the needs of customization and greatly improve the efficiency of the project. For example, in the content platform system developed by the author, the downstream content demander will require the content platform to circle the designated content and push it to the designated processing system according to different strategies. After these processing systems are processed, the content platform receives the processing results and then The distribution strategy (rule) is issued to the recommendation system. Every time you circle the content, you must write a bunch of query logic for the circle, and the content distribution strategy often needs to be changed. Therefore, I want to use the dynamic analysis and execution of the script engine, and use the rule script to abstract the query conditions and the issuing strategy to improve efficiency.

2.2 Technical selection

For scripting languages, the most common is Groovy, and JSR233 also has Groovy built in. For different scripting languages, performance, stability, and flexibility need to be considered when selecting types. After comprehensive consideration, Groovy is selected for the following reasons:

  • The learning curve is gentle, there is a wealth of syntactic sugar, and it is very friendly to Java developers;
  • Mature technology, powerful functions, easy to use and maintain, stable performance, and are favored by the industry;
  • It has strong compatibility with Java, can seamlessly connect with Java code, and can call all Java libraries.

2.3 Business transformation

Because operation and product students' requirements for content are constantly adjusting, the ability of the content platform to circle content needs to be able to support a combination of various query dimensions. The content platform initially developed a query combination (status, warehousing time, source, content type), and directed distribution to an interface for content understanding and marking. But this interface can no longer meet the changes in demand. For this reason, the easiest design to think of is to enumerate all table fields (such as publication time, author name, etc., nearly 20) to make it a query condition. However, the development logic of this design is actually very cumbersome and can easily cause slow queries; for example: screening designated partners and level S up masters, and calling the content understanding interface for videos without content understanding records, that is, for this part Video for content understanding. In order to meet the demand, it needs to be re-developed, and the result is write once, run only once, resulting in a waste of development and release resources.

Both JDBC for Mysql and JDBC for MongoDB are interface-oriented programming, that is, query conditions are encapsulated into interfaces. Based on the interface-oriented programming model, the implementation of the query condition Query interface can be dynamically generated by the script engine, so that any query scenario can be satisfied. The execution flow is as shown in Figure 3.1.

The code Demo of the script is given below:

/**
* 构建查询对象Query
* 分页查询mongodb
*/
public Query query(int page){
    String source = "Groovy";
    String articleType = 4; // (source,articleType) 组成联合索引,提高查询效率
    Query query = Query.query(where("source").is(source)); // 查询条件1:source="Groovy"
    query.addCriteria(where("articleType").is(articleType)); // 查询条件2:articleType=4
    Pageable pageable = new PageRequest(page, PAGESIZE);
    query.with(pageable);// 设置分页
    query.fields().include("authorId"); // 查询结果返回authorId字段
    query.fields().include("level"); // 查询结果返回level字段
    return query;
}
/**
* 过滤每一页查询结果
*/
public boolean filter(UpAuthor upAuthor){
    return !"S".equals(upAuthor.getLevel(); // 过滤掉 level != S 的作者
}
/**
* 对查询结果集逐条处理
*/
public void handle(UpAuthor upAuthor) {
    UpAthorService upAuthorService = SpringUtil.getBean("upAuthorService"); // 从Spring容器中获取执行java bean
    if(upAuthorService == null){
        throw new RuntimeException("upAuthorService is null");
    }
    AnalysePlatService analysePlatService =  SpringUtil.getBean("analysePlatService"); // 从Spring容器中获取执行java bean
        if(analysePlatService == null){
        throw new RuntimeException("analysePlatService is null");
    }
    List<Article> articleList = upAuthorService.getArticles(upAuthor);// 获取作者名下所有视频
    if(CollectionUtils.isEmpty(articleList)){
        return;
    }
    articleList.forEach(article->{
        if(article.getAnalysis() == null){
            analysePlatService.analyse(article.getArticleId()); // 提交视频给内容理解处理
        }  
    })
}

In theory, you can specify any query conditions and write any business logic, so that for businesses whose processes and rules change frequently, they get rid of the constraints of time and space of development and release, so that they can respond to the business change needs of all parties in a timely manner.

Three, Groovy and Java integration

3.1 Basic principles of Groovy

Groovy's syntax is very concise. Even if you don't want to learn its syntax, you can use Java code in Groovy scripts. The compatibility rate is as high as 90%. In addition to lambda and array syntax, other Java syntaxes are basically compatible. There is not much introduction to grammar here. If you are interested, you can read https://www.w3cschool.cn/groovy for learning.

3.2 Integrate Groovy in a Java project

3.2.1 ScriptEngineManager

According to JSR223, use the standard interface ScriptEngineManager to call.

ScriptEngineManager factory = new ScriptEngineManager();
ScriptEngine engine = factory.getEngineByName("groovy");// 每次生成一个engine实例
Bindings binding = engine.createBindings();
binding.put("date", new Date()); // 入参
engine.eval("def getTime(){return date.getTime();}", binding);// 如果script文本来自文件,请首先获取文件内容
engine.eval("def sayHello(name,age){return 'Hello,I am ' + name + ',age' + age;}");
Long time = (Long) ((Invocable) engine).invokeFunction("getTime", null);// 反射到方法
System.out.println(time);
String message = (String) ((Invocable) engine).invokeFunction("sayHello", "zhangsan", 12);
System.out.println(message);

3.2.2 GroovyShell

Groovy officially provides GroovyShell to execute Groovy script fragments. Every time the code is executed, GroovyShell dynamically compiles the code into Java Class, and then generates Java objects for execution on the Java virtual machine, so if you use GroovyShell, it will cause too many classes and poor performance. .

final String script = "Runtime.getRuntime().availableProcessors()";
Binding intBinding = new Binding();
GroovyShell shell = new GroovyShell(intBinding);
final Object eval = shell.evaluate(script);
System.out.println(eval);

3.2.3 GroovyClassLoader

Groovy officially provides the GroovyClassLoader class, which supports loading and parsing Groovy Class from files, urls or strings, instantiating objects, and reflectively calling specified methods.

GroovyClassLoader groovyClassLoader = new GroovyClassLoader();
  String helloScript = "package com.vivo.groovy.util" +  // 可以是纯Java代码
          "class Hello {" +
            "String say(String name) {" +
              "System.out.println(\"hello, \" + name)" +
              " return name;"
            "}" +
          "}";
Class helloClass = groovyClassLoader.parseClass(helloScript);
GroovyObject object = (GroovyObject) helloClass.newInstance();
Object ret = object.invokeMethod("say", "vivo"); // 控制台输出"hello, vivo"
System.out.println(ret.toString()); // 打印vivo

3.3 Performance optimization

When there is a large amount of concurrency in Groovy scripts running in the JVM, if the default strategy is followed, the script will be recompiled each time it is run, and the class loader will be called for class loading. Constantly recompiling the script will increase the CodeCache and Metaspace in the JVM memory, causing memory leaks, and finally causing the Metaspace memory to overflow; there is synchronization in the class loading process, and multi-threaded class loading will cause a large number of threads to block, so the efficiency problem is obvious.

In order to solve the performance problem, the best strategy is to cache the compiled and loaded Groovy scripts to avoid repeated processing. You can generate key-value pairs for caching by calculating the MD5 value of the script. Let's explore with the above conclusions.

3.3.1 The number of Class objects

3.3.1.1 GroovyClassLoader loading script

The three integration methods mentioned above all use GroovyClassLoader to explicitly call the class loading method parseClass, that is, to compile and load Groovy scripts, which naturally deviates from the famous Java ClassLoader parent delegation model.

GroovyClassLoader is mainly responsible for processing Groovy scripts at runtime, compiling and loading them into Class objects. Check the key GroovyClassLoader.parseClass method, as shown in the code 3.1.1.1 (from the JDK source code).

public Class parseClass(String text) throws CompilationFailedException {
    return parseClass(text, "script" + System.currentTimeMillis() +
            Math.abs(text.hashCode()) + ".groovy");
}
public Class parseClass(GroovyCodeSource codeSource, boolean shouldCacheSource) throws CompilationFailedException {
    synchronized (sourceCache) { // 同步块
        Class answer = sourceCache.get(codeSource.getName());
        if (answer != null) return answer;
        answer = doParseClass(codeSource);
        if (shouldCacheSource) sourceCache.put(codeSource.getName(), answer);
        return answer;
    }
}

Each time the system executes a script, a script Class object is generated. The name of this Class object is composed of "script" + System.currentTimeMillis()+Math.abs(text.hashCode(), even if it is the same script, it will be regarded as Compiling and loading new code will cause Metaspace to expand. As the system continues to execute Groovy scripts, it will eventually cause Metaspace to overflow.

Continue to trace the code, the work of GroovyClassLoader to compile Groovy scripts is mainly concentrated in the doParseClass method, as shown in the following code 3.1.1.2 (from the JDK source code):

private Class doParseClass(GroovyCodeSource codeSource) { 
    validate(codeSource); // 简单校验一些参数是否为null 
    Class answer;
    CompilationUnit unit = createCompilationUnit(config, codeSource.getCodeSource()); 
    SourceUnit su = null; 
    if (codeSource.getFile() == null) { 
        su = unit.addSource(codeSource.getName(), codeSource.getScriptText()); 
    } else { 
        su = unit.addSource(codeSource.getFile()); 
    } 
    ClassCollector collector = createCollector(unit, su); // 这里创建了GroovyClassLoader$InnerLoader
    unit.setClassgenCallback(collector); 
    int goalPhase = Phases.CLASS_GENERATION; 
    if (config != null && config.getTargetDirectory() != null) goalPhase = Phases.OUTPUT; 
    unit.compile(goalPhase); // 编译Groovy源代码 
    answer = collector.generatedClass;   // 查找源文件中的Main Class
    String mainClass = su.getAST().getMainClassName(); 
    for (Object o : collector.getLoadedClasses()) { 
        Class clazz = (Class) o; 
        String clazzName = clazz.getName(); 
        definePackage(clazzName); 
        setClassCacheEntry(clazz); 
        if (clazzName.equals(mainClass)) answer = clazz; 
    } 
    return answer; 
}

Continue to look at the createCollector method of GroovyClassLoader, as shown in the following code 3.1.1.3 (from the JDK source code):

protected ClassCollector createCollector(CompilationUnit unit, SourceUnit su) { 
    InnerLoader loader = AccessController.doPrivileged(new PrivilegedAction<InnerLoader>() { 
        public InnerLoader run() { 
            return new InnerLoader(GroovyClassLoader.this);  // InnerLoader extends GroovyClassLoader
        } 
    }); 
    return new ClassCollector(loader, unit, su); 
}   
public static class ClassCollector extends CompilationUnit.ClassgenCallback { 
    private final GroovyClassLoader cl; 
    // ... 
    protected ClassCollector(InnerLoader cl, CompilationUnit unit, SourceUnit su) { 
        this.cl = cl; 
        // ... 
    } 
    public GroovyClassLoader getDefiningClassLoader() { 
        return cl; 
    } 
    protected Class createClass(byte[] code, ClassNode classNode) { 
        GroovyClassLoader cl = getDefiningClassLoader(); // GroovyClassLoader$InnerLoader
        Class theClass = cl.defineClass(classNode.getName(), code, 0, code.length, unit.getAST().getCodeSource()); // 通过InnerLoader加载该类
        this.loadedClasses.add(theClass); 
        // ... 
        return theClass; 
    } 
    // ... 
}

The role of ClassCollector is to load the compiled bytecode through InnerLoader during the compilation process. In addition, every time the groovy source code is compiled, an instance of InnerLoader will be created. Then with GroovyClassLoader, why do we need InnerLoader? There are two main reasons:

loads the class of the same name

The class loader and the full name of the class can establish the uniqueness of the Class object in the JVM. Since a ClassLoader can only be loaded once for a class with the same name, if all are loaded by GroovyClassLoader, after the com.vivo.internet.Clazz class is defined in one script, another script defines a com.vivo.internet. With the Clazz class, GroovyClassLoader cannot be loaded.

Recycled Class object

Because when the ClassLoader of a Class object is recycled, the Class object may be recycled. If all classes are loaded by the GroovyClassLoader, then only when the GroovyClassLoader is recycled, all these Class objects may be recycled, and if the InnerLoader is used, After the source code is compiled, there is no external reference to it, it can be recycled, and the Class object loaded by it can be recycled. The recycling of Class objects is discussed in detail below.

3.3.1.2 JVM reclaims Class objects

When will the garbage collection of Metaspace be triggered?

  • Metaspace when there is no more memory space, such as when loading a new class;
  • There is another variable called \_capacity\_until_GC inside the JVM. Once the space used by Metaspace exceeds the value of this variable, Metaspace will be recycled;
  • Metaspace will be recycled during FGC.

You may have questions here: even if there are too many Classes, as long as Metaspace triggers GC, there should be no overflow. Why does the above conclude that Metaspace overflows? This leads to the next question: JVM to reclaim Class objects?

  • All instances of this class have been GC, that is, there is no instance of this class in the JVM;
  • The ClassLoader that loaded this class has been GC;
  • The java.lang.Class object is not referenced anywhere.

condition 1 , GroovyClassLoader will compile the script into a class. When this script class runs, it uses reflection to generate an instance and call its entry function to execute (see Figure 3.1 for details). This action will generally only be executed once, in the application There will be no other references to this class or the instances it generates, and this condition can at least be met by standard programming. condition 2 , as analyzed above, the InnerClassLoader can be recycled after it is used up, so the condition can be met. Condition 3 , because the Class object of the script has been referenced, the condition cannot be met.

In order to verify the conclusion that condition 3 is unsatisfactory, continue to view a piece of code 3.1.2.1 (from the JDK source code) in GroovyClassLoader:

/**
* this cache contains the loaded classes or PARSING, if the class is currently parsed
*/
protected final Map<String, Class> classCache = new HashMap<String, Class>();
 
protected void setClassCacheEntry(Class cls) {
    synchronized (classCache) { // 同步块
        classCache.put(cls.getName(), cls);
    }
}

The loaded Class object will be cached in the GroovyClassLoader object, causing the Class object to be unrecyclable.

3.3.2 Thread blocking when high concurrency

There are two synchronization code blocks above, see code 3.1.1.1 and code 3.1.2.1 for details. When Groovy scripts are loaded with high concurrency, a large number of threads will be blocked, which will definitely cause performance bottlenecks.

3.3.3 Solution

  • Cache the Class object generated after parseClass, the key is the md5 value of the Groovy script, and the cache can be refreshed after modifying the configuration on the configuration side. This has two advantages: (1) Solve the problem of full Metaspace; (2) Because it does not need to be compiled and loaded at runtime, it can speed up the execution of scripts.
  • The use of GroovyClassLoader refers to Tomcat's ClassLoader system. A limited number of GroovyClassLoader instances reside in memory to increase processing throughput.
  • Script staticization: Use Java static types as much as possible in Groovy scripts, which can reduce Groovy dynamic type checking, etc., and improve the efficiency of compiling and loading Groovy scripts.

Four, safety

4.1 Active safety

4.1.1 Encoding Security

Groovy will automatically introduce java.util and java.lang packages to facilitate users to call, but it also increases the risk of the system. In order to prevent users from calling System.exit or Runtime and other methods to cause system downtime, as well as custom Groovy fragment code execution endless loop or calling resource timeout, Groovy provides the SecureASTCustomizer security manager and SandboxTransformer sandbox environment.

final SecureASTCustomizer secure = new SecureASTCustomizer();// 创建SecureASTCustomizer
secure.setClosuresAllowed(true);// 禁止使用闭包
List<Integer> tokensBlacklist = new ArrayList<>();
tokensBlacklist.add(Types.**KEYWORD_WHILE**);// 添加关键字黑名单 while和goto
tokensBlacklist.add(Types.**KEYWORD_GOTO**);
secure.setTokensBlacklist(tokensBlacklist);
secure.setIndirectImportCheckEnabled(true);// 设置直接导入检查
List<String> list = new ArrayList<>();// 添加导入黑名单,用户不能导入JSONObject
list.add("com.alibaba.fastjson.JSONObject");
secure.setImportsBlacklist(list);
List<Class<? extends Statement>> statementBlacklist = new ArrayList<>();// statement 黑名单,不能使用while循环块
statementBlacklist.add(WhileStatement.class);
secure.setStatementsBlacklist(statementBlacklist);
final CompilerConfiguration config = new CompilerConfiguration();// 自定义CompilerConfiguration,设置AST
config.addCompilationCustomizers(secure);
GroovyClassLoader groovyClassLoader = new GroovyClassLoader(this.getClass().getClassLoader(), config);
​

4.1.2 Process safety

By standardizing the process, increase the credibility of script execution.

4.2 Passive safety

Although SecureASTCustomizer can impose a certain degree of security restrictions on scripts, and can also standardize the process to be further strengthened, there are still large security risks for script writing, which can easily cause problems such as skyrocketing cpu and crazy occupation of disk space that seriously affect system operation. Therefore, some passive security measures are required, such as the use of thread pool isolation, effective real-time monitoring, statistics, and encapsulation of script execution, or manual force killing of the thread that executes the script.

Five, summary

Groovy is a dynamic scripting language, which is suitable for rapid business changes and configuration requirements. Groovy is very easy to use, and its essence is also Java code running on the JVM. Java programmers can use Groovy to go further in improving development efficiency, speeding up response to changes in demand, and improving system stability.

Author: vivo Internet server team-Gao Xiang

vivo互联网技术
3.3k 声望10.2k 粉丝