Write a lisp interpreter in java

At first I heard the name lisp by chance, and it left a very good impression. Five years have passed in a hurry. I watched sicp a few days ago, and the name was mentioned again in it, and I found a few introductions from the Internet. The document learned the basic grammar, and then continued to look at sicp; from writing the first line (+ 1 2) of code, a month passed in a blink of an eye, I have to say that the way of lisp's prefix expression is still very good , I somehow slowly got the idea of writing a lisp interpreter, and then I thought that Wang Yin seemed to have written an article "How to Write an Interpreter" If you are interested in how to write an interpreter, you can read it This article is still inspiring.

are you back? Hahaha

Let's continue our journey, the lisp dialect chosen here is Scheme, and there is nothing else to choose it, just because it is the basic grammar we learned at the beginning

Writing a lisp interpreter can be divided into three steps:

1. Convert the string of the lisp expression into a tree-structured array
2. Explain the tree
3. Support variables and method calls

The language used here is java

construct syntax tree

First step: how to convert the string of lisp expression into a tree-structured array?
Let's look at a few lisp expressions to analyze their composition

(+ 1 2)
(+ 1 2 (- 3 4))
(+ 1 2 (- 3 4) 5)
(+ 1 2 (- 3 4) 5 (+ 6 7 (+ 8 9)))
(+ 1 2 (- 3 4) (+ 5 6 (+ 7 8)))
(+ 1 2 (- 3 4) (+ 5 6 (+ 7 8)) 9)

It can be seen that the above expression can be divided into two elements: one is an indivisible minimum element such as + - 1 2 3, and a compound element such as (- 3 4), and the compound element is also composed of the smallest basic element. compose, So we get the first rule (compound elements can be split into smaller base elements and compound elements) .
Take the expression (+ 1 2 (- 3 4) 5 (+ 6 7 (+ 8 9))) as an example, what would it look like if it were a tree? Let's draw its shape:

We have what it looks like, but how do we convert an expression in string form into such a tree? This is the question we will analyze next
duang duang duang duang...
Let's go back to our first rule. What else is there to hide?
1. Composite elements can be split
2. The basic elements are inseparable
3. A compound element is an element wrapped by "()"
With these three items, we can think further, tree tree tree, what are the elements of the tree?
1. Node
2. Leaf node
Eyebrows, Eyebrows, Eyebrows
Nodes correspond to compound elements, and base elements correspond to leaf nodes. How to distinguish compound elements from base elements?
"3. A compound element is an element wrapped by "()"", it is it, it is it, it is it.
The first element in the compound node is the first element after "(", and the last element is the first element before ")", and we get the second rule; has the above two rules Let's start building our first tree:
Code:

node

class ExpNode implements Iterable<Object>{

   List<Object> data = new ArrayList<>();
   Node parent;

   public static ExpNode newInstance() {
        return new ExpNode();
   }

    public void add(Object o) {
        if (o instanceof ExpNode) {
            ((ExpNode) o).parent = this;
        }
        super.add(o);
    }

    public Iterator<Object> iterator() {
        return data.iterator();
    }
}

parse

public class Parse {

    public static ExpNode parseTree(String exp) {
        ExpNode root = ExpNode.newInstance();
        buildNode(trimStr(exp), 0, root);
        return (ExpNode) root.iterator().next();
    }

    private static void buildNode(String exp, int level, Cons node) {
        int lIndex = exp.indexOf("(");
        int rIndex = exp.indexOf(")");
        if (rIndex > -1) {
            if (lIndex < rIndex && lIndex > -1) {
                String subExp = exp.substring(lIndex, rIndex + 1);
                if (isBaseLeaf(exp)) {
                    ExpNode a = parseNode(subExp).orElseThrow(RuntimeException::new);
                    node.add(a);
                } else {
                    Optional<ExpNode> nodeOptional = parseNode(subExp);
                    if (nodeOptional.isPresent()) {
                        ExpNode val = nodeOptional.get();
                        node.add(val);
                        node = val;
                    } else {
                        ExpNode objects = ExpNode.newInstance();
                        node.add(objects);
                        node = objects;
                    }
                }
                ++level;
                log.debug("{}{}---{}", createRepeatedStr(level), exp.substring(lIndex), subExp);
                buildNode(exp.substring(lIndex + 1), level, node);
            } else {
                //) b a (+ 8 9) => ) b a ( => b a
                if (lIndex > -1) {
                    String subExp = trimStr(exp.substring(rIndex + 1, lIndex));
                    if (subExp.length() > 0 && !subExp.contains(")")) {
                        String[] values = subExp.split(" ");
                        for (String val : values) {
                            node.parent().add(parseObj(val));
                        }
                    }
                } else {
                    // 所有都是后退
//                    ) b a) => b a) => b a
                    String subExp = exp.substring(rIndex + 1);
                    int r2Index = 1 + subExp.indexOf(")");
                    if (r2Index > 1) {
                        subExp = trimStr(subExp.substring(1, r2Index - 1));
                        if (subExp.length() > 0) {
                            String[] values = subExp.split(" ");
                            for (String val : values) {
                                node.parent().add(parseObj(val));
                            }
                        }
                    }
                }
                --level;
                log.debug("{}{}", createRepeatedStr(level), exp.substring(rIndex));
                buildNode(exp.substring(rIndex + 1), level, node.parent());
            }
        } else {
            log.debug(createRepeatedStr(level));
        }

    }

    private static Optional<ExpNode> parseNode(String exp) {
        String subExp = "";
        if (isBaseLeaf(exp)) {
            //（xx [xx]）
            subExp = exp.substring(1, exp.length() - 1);
        } else {
            // （xx [xx] (xx xx xx）
            // (xx [xx] (
            // ((
            subExp = exp.substring(1);
            subExp = subExp.substring(0, subExp.indexOf("("));
            if (subExp.trim().isEmpty()) {
                return Optional.empty();
            }
        }
        String[] keys = subExp.split(" ");
        ExpNode node = ExpNode.newInstance();
        for (int i = 0; i < keys.length; i++) {
            node.add(parseObj(keys[i]));
        }
        return Optional.of(node);
    }

    private static Object parseObj(String val) {
        try {
            return Integer.valueOf(val);
        } catch (NumberFormatException e) {
            if (val.equals("true") || val.equals("false")) {
                return Boolean.valueOf(val);
            } else if (val.indexOf("'") == 0 && val.lastIndexOf("'") == val.length() - 1) {
                return val.replaceAll("\"", "\"");
            } else {
                return val;
            }
        }
    }

    private static boolean isBaseLeaf(String exp) {
        return count(exp, "\\(") == 1 && count(exp, "\\)") == 1 && exp.matches("^\\(.+?\\)$");
    }

    private static int count(String str, String regx) {
        Matcher matcher = Pattern.compile(regx).matcher(str);
        int i = 0;
        while (matcher.find()) {
            i++;
        }
        return i;
    }

    private static String trimStr(String str) {
        String tempStr = str.replace("  ", " ").trim();
        return tempStr.contains("  ") ? trimStr(tempStr) : tempStr;
    }

    private static String createRepeatedStr(int n) {
        return String.join("", Collections.nCopies(n, "--"));
    }
}

Of course, there is a simpler way hahaha, replace "(" with "[", convert ")" into "]", and then parse it with a json parser, hahahaha

interpreter

The grammar in lisp is expressed in the form of list + prefix expression , and the list can correspond to the compound elements we just mentioned; for prefix expressions, here is an example of the familiar infix expression: 1 +2+5, where 1, 2, and 5 are operands, + is an operator, the way the operator is in the middle of its operand, and the prefix expression is the way the operator is at the front of its operand, of course There are also postfix expressions.
Let's take (+ 1 2 (- 3 4) 5 (+ 6 7 (+ 8 9))) as an example:
We put this into our Parse.parseTree() and we get a shape that is

such a nested list, then we interpret it
Combining the composition of the prefix expression just now, we will find that the first element in the list is an operator, and the latter can be an operand or a list, and then we deploy two methods to ExpNode: one is the car method, which is used to obtain the first element in the list. elements, one is the cdr method used to get the remaining elements in the list after removing the first element:

class ExpNode implements Iterable<Object>{

    List<Object> data = new ArrayList<>();
    ...

    public Object car(){
        return data.get(0);
    }

    public ExpNode cdr(){
         List<Object> subData = getData().subList(1, this.getData().size());
        return ExpNode.one(this, subData.toArray()); 
    }

    private static ExpNode one(ExpNode exp, Object... vs) {
        ExpNode objects = ExpNode.newInstance();
        for (Object o : vs) {
            objects.add(o);
        }
        objects.parent = exp;
        return objects;
    }

    ...
}

Start constructing our first interpreter:

public class JLispInter {

     public static Object inter(ExpNode exp){
        Object car = exp.car();
        ExpNode cdr = exp.cdr();
        if (car instanceof String){
            switch ((String)car){
                case "+":
                    return cdr.data().stream().map(JLispInter::getAtom).mapToInt(o->(Integer)o).sum();
                case "-":
                    return cdr.data().stream().map(JLispInter::getAtom).mapToInt(o->(Integer)o).reduce((x,y)->x-y).orElseThrow(RuntimeException::new);
                case "*":
                    return cdr.data().stream().map(JLispInter::getAtom).mapToInt(o->(Integer)o).reduce((x,y)->x*y).orElseThrow(RuntimeException::new);
                case "/":
                    return cdr.data().stream().map(JLispInter::getAtom).mapToInt(o->(Integer)o).reduce((x,y)->x/y).orElseThrow(RuntimeException::new);
                default:
                    return car;
            }
        }else if (car instanceof ExpNode){
            Object o = inter((ExpNode) car);
            return cdr.isEmpty()?o:inter(cdr);
        }else {
            return cdr.isEmpty()?car:inter(cdr);
        }
    }

    private static Object getAtom(Object o){
        if (o instanceof ExpNode){
            return inter((ExpNode) o);
        }else{
            return o;
        }
    }
}

It is found that ExpNode also needs to be deployed (data and isEmpty) methods

class ExpNode implements Iterable<Object>{

    List<Object> data = new ArrayList<>();
    ...

    public List<Object> data(){return data};
    public boolean isEmpty(){return data.isEmpty()}
    ...
}

enter

    public static void main(String[] args) {
        System.out.println(inter(Parse.parseTree("(+ 1 2 (- 3 4) 5 (+ 6 7 (+ 8 9)))")));
    }

will get

Although this simple interpreter only supports four arithmetic operations, it already has the ability to parse nested lists, providing a solid foundation for our follow-up (such as scope), and we will add definitions lambda based on this interpreter later. The support of keywords, and these all depend on two things, one is the scope and the other is the variable, these two are important features, especially the variable, take a break, drink a glass of water, and continue our journey.

Support for variables and method calls

First, let's implement the function of variables on the basis of the previous interpreter. This is very simple. The key is to bind variables to corresponding values. In java, we can implement it through Map.
Define a Map and we name it env

Map<String,Object> env = new HashMap<>();

On the basis of env, we define two more methods: define a value define(k,v), env(k)

    public static void define(String k, Object v){
        env.put(k,v)
    }

    public static Optional<Object> env(String key){
        return Optional.of(env.get(key));
    }

Added some features to JLispInter

public class JLispInter {

     public static Object inter(ExpNode exp){
        Object car = exp.car();
        ExpNode cdr = exp.cdr();
        if (car instanceof String){
            switch ((String)car){
                ...
                   case "define":
                    define((String)cdr.car(), inter(cdr.cdr()));
                    return null;
                default:
                   return env((String)car).orElse(car);
            }
        }else if (car instanceof ExpNode){
            Object o = inter((ExpNode) car);
            return cdr.isEmpty()?o:inter(cdr);
        }else {
            return cdr.isEmpty()?car:inter(cdr);
        }
    }

    private static Object getAtom(Object o){
         if (o instanceof ExpNode){
            return inter((ExpNode) o);
        }else if (o instanceof String){
            return env((String)o).orElse(null);
        }else {
            return o;
        }
    }
}

enter

 System.out.println(inter(Parse.parseTree("((define a 5) (+ a 6))")));
 System.out.println(inter("((define a 5) (define b 8) (+ a b))"));

output

11
13

Now we have support

variable: a
Binding: (define a 5)
Four arithmetic operations: + - * /

Next, we will support methods and calls
The method here we choose to support lambda, the lambda expression form is as follows:
(lambda (x) e)
Before we start, let's make some modifications to the parser and add a symbol type, in order to facilitate the subsequent distinction from String
Parse

public class Parse {
...
   private static Object parseObj(String val) {
        try {
            return Integer.valueOf(val);
        } catch (NumberFormatException e) {
            if (val.equals("true") || val.equals("false")) {
                return Boolean.valueOf(val);
            } else if (val.indexOf("'") == 0 && val.lastIndexOf("'") == val.length() - 1) {
                return val.replaceAll("\"", "\"");
            } else {
                return Symbols.of(val);
            }
        }
    }
...
}

Symbols

public interface Symbols {
    static Symbols of(String name) {
        return new Symbols.SimpleVar(name);
    }

    String getName();

    @Value
    class SimpleVar implements Symbols {
        String name;

        @Override
        public String toString() {
            return "`" + name;
        }
    }
}

Then we transform the environment so that it can distinguish the variables in the lambda from the variables defined in the define, and support variables that do not exist in the current scope to search upwards, and no exceptions are found.
Env

    public static class Env{

        private  final Map<String,Object> env = new HashMap<>();
        private Env parent;

        public static Env newInstance(Env parent){
            Env env1 = new Env();
            env1.parent = parent;
            return env1;
        }
        
        public  void define(String key,Object val){
            env.put(key,val);
        }

        public  Optional<Object> env(Symbols symbols){
            String symbolsName = symbols.getName();
            return Optional.ofNullable(env.containsKey(symbolsName)?env.get(symbolsName):(parent!=null?parent.env(symbols).orElse(null):null));
        }
    }

JLispInter

public class JLispInter {
   public static Object inter(String exp){
      return inter(Parse.parseTree(exp),Env.newInstance(null));
    }

    // 1
    public static Object inter(ExpNode exp,Env env){
        Object car = exp.car();
        ExpNode cdr = exp.cdr();
        if (car instanceof Symbols){
            switch (((Symbols) car).getName()){
                case "+":
                    //2
                    return cdr.data().stream().map(o->getAtom(o,env)).mapToInt(o->(Integer)o).sum();
                case "-":
                    return cdr.data().stream().map(o->getAtom(o,env)).mapToInt(o->(Integer)o).reduce((x,y)->x-y).orElseThrow(RuntimeException::new);
                case "*":
                    return cdr.data().stream().map(o->getAtom(o,env)).mapToInt(o->(Integer)o).reduce((x,y)->x*y).orElseThrow(RuntimeException::new);
                case "/":
                    return cdr.data().stream().map(o->getAtom(o,env)).mapToInt(o->(Integer)o).reduce((x,y)->x/y).orElseThrow(RuntimeException::new);
                case "define":
                    env.define(cdr.carSymbols().getName(), inter(cdr.cdr(),env));
                    return null;
                case "lambda":
                    return (Function<Object[],Object>) (x)->{
                        ExpNode args = (ExpNode)cdr.car();
                        ExpNode body = cdr.cdr();
                        validateTrue(args.data().size()==x.length,"参数不一致");
                        //3
                        Env env0 = Env.newInstance(env);
                        int i = 0;
                        for (Object argName : args) {
                            env0.define(((Symbols)argName).getName(), x[i]);
                            i++;
                        }
                        return inter(body,env0);
                    };
                default:
                    Optional<Object> env1 = env.env((Symbols)car);
                    // 4
                    boolean isApply = env1.isPresent()&&env1.get() instanceof Function&&exp.isExp();
                    if(isApply){
                        Object v = env1.get();
                        Function<Object[],Object> f = (Function<Object[], Object>) v;
                       return f.apply(cdr.data().stream().map(o -> getAtom(o, env)).toArray());
                    }
                    return env1.orElse(car);
            }
        }else if (car instanceof ExpNode){
            Object o = inter((ExpNode) car,env);
            return cdr.isEmpty()?o:inter(cdr,env);
        }else {
            return cdr.isEmpty()?car:inter(cdr,env);
        }
    }

    private static Object getAtom(Object o,Env env){
        if (o instanceof Cons){
            return inter((Cons) o,env);
        }else if (o instanceof Symbols){
            //5
            return env.env((Symbols) o).orElseThrow(()->new IllegalArgumentException(o+"不存在"));
        }else {
            return o;
        }
    }

    private static void validateTrue(boolean flag, String err){
        if (!flag){
            throw new IllegalArgumentException(err);
        }
    }
}

when we enter

 System.out.println(inter("((define f (lambda (x y) (+ x y))) (f 9 8))"));
System.out.println(inter("((define add (lambda (x y) (+ x y)))(define f (lambda (o x y) (o x y))) (f add 9 8))"));

will output

17
17

At mark 1, we added the input parameter of the env environment variable, so that the exp bound to the global environment at the beginning can be bound to different environments, thereby decoupling from the global environment and providing support for subsequent scopes.
At mark 2, the data obtained is changed from the global variable environment to the specific context variable environment
Each lambda expression at mark 3 has its own variable environment, and has an external variable environment reference, which is convenient for supporting variables to look up the external variable environment later
Mark 4 is added to determine whether the first element of the current expression is a variable and the bound value is a function, if so, then (through isExp) to determine whether it is an expression (complete expression, the opposite is through exp.cdr intercepted expression fragment), and then call (apply)
Variable with unbound value at marker 5 throws exception

Summarize

At this point, the JLispInter we wrote already supports it

variable: a
Binding: (define a 5)
Four arithmetic operations: + - * /
function (lambda (x) (exp))
call(g exp)

Of course, there are still many imperfections, such as the parser still does not support strings, the four arithmetic operations define in higher-order functions cannot be used as input parameters, anonymous functions, branch judgments, etc. are not supported, these will be written later if there is time. A new article, spoiler, a parser that supports these features was written at the end of last year, if it's just posting code, I think this article will be soon.

Write a lisp interpreter in java

construct syntax tree

interpreter

Support for variables and method calls

Summarize

yangrd

引用和评论

idea远程链接k8s debug

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性