Writing a Parser in Java: The Expression Tree

The expression tree for the mathematical expression 3*2^4 + sqrt(1+3)

The last post in this series showed you how a mathematical expression parser was implemented from the grammar that we designed earlier. This was all fine and good but the parser did not do much except confirm if a mathematical expression was valid or not. In the end there was no way of evaluating that expression because we never stored any of the information that the parser was able to retrieve from the input. We will have to do this by adding side effects in the parser’s methods, but before we do we have to decide how our mathematical expressions should be stored. That is the topic of this post.

The expression tree after reducing the 2^3 nodes to the result 16.

The typical way of storing expressions is in a tree structure. Each node in the tree represents either an operation or a value. The outermost operation (the one that is performed last in the evaluation of the expression) is at the root of the tree and the values are stored in the leaf node. The figure on the right shows the tree for the expression 3*2^4 + sqrt(1+3). The outermost operation is the + which adds  3*2^4 and sqrt(1+3). Evaluating the expression is done from the leaves upward. In our example we can start on the left side and evaluate 2^4 and replace the exponentiation with the result, 16. The resulting tree is shown below.

The expression tree after all but the last reductions.

We continue in the same way with the 3*16 nodes and replace the multiplication with the result 48. On the right hand side of the tree we can evaluate the addition 1+3 and replace it with the sum 4. Then we evaluate sqrt(4) which results 2. The sqrt node is replaced and the tree, after all these operations is shown in the figure below. The final result of the expression is obtained from adding the two numbers which then gives 50 as the answer.

A class representation of the tree

To store the tree in some kind of data structure we need to cast it into some sort of object hierarchy. We will create a base interface for the nodes of the tree and a number of classes that implement the interface for different types of nodes, that is one class for additions, one class for multiplications and so on. The interface is called ExpressionNode.

public interface ExpressionNode {
  public static final int VARIABLE_NODE = 1;
  public static final int CONSTANT_NODE = 2;
  public static final int ADDITION_NODE = 3;
  public static final int MULTIPLICATION_NODE = 4;
  public static final int EXPONENTIATION_NODE = 5;
  public static final int FUNCTION_NODE = 6;
  public int getType();
  public double getValue();
}

ExpressionNode defines a couple of static integer constants that can be used to identify different node types. The abstract method getType() is intended to return the type of the node, that is every derived class should return one of the above constants. The method getValue() should return the evaluated result of the sub-tree which is rooted in the current node. In the example above, the node corresponding to the square root should return 2.

We now implement this interface and create a class called ConstantExpressionNode for holding numeric constants, that is numbers like 2 or 3.14159.

public class ConstantExpressionNode implements ExpressionNode {
  private double value;

  public ConstantExpressionNode(double value) {
    this.value = value;
  }

  public ConstantExpressionNode(String value) {
    this.value = Double.valueOf(value);
  }

  public double getValue() {
    return value;
  }

  public int getType() {
    return ExpressionNode.CONSTANT_NODE;
  }
}

The method getType() returns ExpressionNode.CONSTANT_NODE and getValue() returns the value stored in a private member of type double. There are two constructors, one will take a double value of the constant and the other will take a which is converted to the double value.

The expression node that holds variables or named constant is called VariableExpressionNode.

public class VariableExpressionNode implements ExpressionNode {
  private String name;
  private double value;
  private boolean valueSet;

  public VariableExpressionNode(String name) {
    this.name = name;
    valueSet = false;
  }

  public int getType() {
    return ExpressionNode.VARIABLE_NODE;
  }

  public void setValue(double value) {
    this.value = value;
    this.valueSet = true;
  }

  public double getValue() {
    if (valueSet)
      return value;
    else
      throw new EvaluationException("Variable '" 
        + name + "' was not initialized.");
  }
}

VariableExpressionNode has an internal parameter, name, that stores the name of the variable along with a double, value,  that stores the value of the variable if it is set. A boolean, valueSet, indicates if the value has been set. The getValue() method will return the value or throw an exception, depending on the state of valueSet.

Next we want to create a class that captures additions. Instead of holding just two terms which are added, we decide that an addition should hold an arbitrary number of terms which are either added or subtracted. We will have a similar requirement for multiplications and we will separate out the commonalities in a class called SequenceExpressionNode. First we define an inner class called Term which holds a single term of the sum together with a boolean indicating the sign.

public class Term {
  public boolean positive;
  public ExpressionNode expression;

  public Term(boolean positive, ExpressionNode expression) {
    super();
    this.positive = positive;
    this.expression = expression;
  }
}

This class is used by the SequenceExpressionNode. Note how the class is declared abstract because we are not yet implementing the methods of the ExpressionNode interface.

public abstract class SequenceExpressionNode
    implements ExpressionNode {

  protected LinkedList<Term> terms;

  public SequenceExpressionNode() {
    this.terms = new LinkedList<Term>();
  }

  public SequenceExpressionNode(ExpressionNode a, boolean positive) {
    this.terms = new LinkedList<Term>();
    this.terms.add(new Term(positive, a));
  }

  public void add(ExpressionNode a, boolean positive) {
    this.terms.add(new Term(positive, a));
  }
}

The linked list terms contains all the terms of the sum or product. Two constructors are available, one creates an empty sequence, the other immediately adds the first term into the sequence. The second constructor is a there for convenience so that sequences can be created easily. The add() method allows terms to be added to the sum or product. With this class defined, we can now go ahead and write the AdditionExpressionNode class.

public class AdditionExpressionNode
      extends SequenceExpressionNode {

  public AdditionExpressionNode() {
    super();
  }

  public AdditionExpressionNode(ExpressionNode a,
                                boolean positive) {
    super(a, positive);
  }

  public int getType() {
    return ExpressionNode.ADDITION_NODE;
  }

  public double getValue() {
    double sum = 0.0;
    for (Term t : terms) {
      if (t.positive)
        sum += t.expression.getValue();
      else
        sum -= t.expression.getValue();
    }
    return sum;
  }
}

The constructors and the getType() method are implemented as expected. getValue() iterates over the terms and adds or subtracts them from the sum, depending of the positive flag inside the terms. In order to get the values of the sub-expressions that are added and subtracted, we can recursively call the getValue() methods of all the terms.

The MultiplicationExpressionNode is very similar.

public class MultiplicationExpressionNode
      extends SequenceExpressionNode {

  public MultiplicationExpressionNode() {
    super();
  }

  public MultiplicationExpressionNode(ExpressionNode a,
                                      boolean positive) {
    super(a, positive);
  }

  public int getType() {
    return ExpressionNode.MULTIPLICATION_NODE;
  }

  public double getValue() {
    double prod = 1.0;
    for (Term t : terms) {
      if (t.positive)
        prod *= t.expression.getValue();
      else
        prod /= t.expression.getValue();
    }
    return prod;
  }
}

The only new thing here is the getValue() method. Instead of adding or subtracting terms from a sum that is initially zero we now multiply or divide from a product. The product has to be initialised to 1.0 instead of 0.0 like the sum in the AdditionExpressionNode. In contrast to the previous two operations, exponentiation only takes two arguments, the base and the exponent. Accordingly, the ExponentiationExpressionNode looks slightly simpler.

public class ExponentiationExpressionNode
      implements ExpressionNode {

  private ExpressionNode base;
  private ExpressionNode exponent;

  public ExponentiationExpressionNode(ExpressionNode base, 
                                      ExpressionNode exponent) {
    this.base = base;
    this.exponent = exponent;
  }

  public int getType() {
    return ExpressionNode.EXPONENTIATION_NODE;
  }

  public double getValue() {
    return Math.pow(base.getValue(), exponent.getValue());
  }
}

The constructor assigns the private base and exponent expressions and getValue() uses Math.pow() to calculate the result of the exponentiation. Again, because the base and the exponent are both expression nodes themselves, we call their getValue() methods to obtain a value.

The last type of expression node that needs implementing will handle mathematical functions. This is the only class which will be slightly more laborious because we have to use control statements to tell the different functions apart.

public class FunctionExpressionNode
    implements ExpressionNode {
  public static final int SIN = 1;
  public static final int COS = 2;
  public static final int TAN = 3;

  public static final int ASIN = 4;
  public static final int ACOS = 5;
  public static final int ATAN = 6;

  public static final int SQRT = 7;
  public static final int EXP = 8;

  public static final int LN = 9;
  public static final int LOG = 10;
  public static final int LOG2 = 11;

  private int function;
  private ExpressionNode argument;

  public FunctionExpressionNode(int function,
                                ExpressionNode argument) {
    super();
    this.function = function;
    this.argument = argument;
  }

  public int getType() {
    return ExpressionNode.FUNCTION_NODE;
  }

  public double getValue() {
    switch (function) {
      case SIN:  return Math.sin(argument.getValue());
      case COS:  return Math.cos(argument.getValue());
      case TAN:  return Math.tan(argument.getValue());
      case ASIN: return Math.asin(argument.getValue());
      case ACOS: return Math.acos(argument.getValue());
      case ATAN: return Math.atan(argument.getValue());
      case SQRT: return Math.sqrt(argument.getValue());
      case EXP:  return Math.exp(argument.getValue());
      case LN:   return Math.log(argument.getValue());
      case LOG:  return Math.log(argument.getValue())
        * 0.43429448190325182765;
      case LOG2: return Math.log(argument.getValue())
        * 1.442695040888963407360;
    }
    throw new EvaluationException("Invalid function id "+function+"!");
  }

  public static int stringToFunction(String str) {
    if (str.equals("sin")) return FunctionExpressionNode.SIN;
    if (str.equals("cos")) return FunctionExpressionNode.COS;
    if (str.equals("tan")) return FunctionExpressionNode.TAN;

    if (str.equals("asin")) return FunctionExpressionNode.ASIN;
    if (str.equals("acos")) return FunctionExpressionNode.ACOS;
    if (str.equals("atan")) return FunctionExpressionNode.ATAN;

    if (str.equals("sqrt")) return FunctionExpressionNode.SQRT;
    if (str.equals("exp")) return FunctionExpressionNode.EXP;

    if (str.equals("ln")) return FunctionExpressionNode.LN;
    if (str.equals("log"))return FunctionExpressionNode.LOG;
    if (str.equals("log2")) return FunctionExpressionNode.LOG2;

    throw ParseException("Unexpected Function "+str+" found!");
  }

  public static String getAllFunctions() {
    return "sin|cos|tan|asin|acos|atan|sqrt|exp|ln|log|log2";
  }

}

We first define a series of integer constants for the different mathematical functions available. The private member function will be set to one of these pre-specified values. The expression node argument holds the argument of the function. The method getValue() decides, in a switch statement, which function to apply to the argument.

Two static methods are provided to help tokenizing and parsing an expression from a String. The method stringToFunction() will take a string of a function name and return the corresponding integer function identifier. If the method is passed a string that is not a recognised function then an exception is thrown. This method will be used by the parser to create new FunctionExpressionNode objects. The method getAllFunctions() returns a string containing all supported functions, separated by a |. This string may be used by the tokenizer to create a regular expression that identifies functions inside a string.

Creating an expression by hand

These are all the classes needed to represent mathematical expressions. But for now we have not created the code necessary to create the expression tree when parsing an expression from a string. This means that, for now, if we want to test our code we will have to create the tree by hand. The code for creating the expression 3*2^4 + sqrt(1+3) is given below.

AdditionExpressionNode innerSum =
  new AdditionExpressionNode();
innerSum.add(new ConstantExpressionNode(1), true);
innerSum.add(new ConstantExpressionNode(3), true);

ExpressionNode sqrt =
  new FunctionExpressionNode(
    FunctionExpressionNode.SQRT,
    innerSum);

ExpressionNode expo =
  new ExponentiationExpressionNode(
    new ConstantExpressionNode(2),
    new ConstantExpressionNode(4));

MultiplicationExpressionNode prod =
  new MultiplicationExpressionNode();
prod.add(new ConstantExpressionNode(3), true);
prod.add(expo, true);

AdditionExpressionNode expression =
  new AdditionExpressionNode();
expression.add(prod, true);
expression.add(sqrt, true);

System.out.println("The result is "
  + expression.getValue());

If we have done everything right then this code will print out

The result is 50.0

In the next contribution to this series I will modify the parser to generate the expression tree while parsing the input.

8 Comments

  1. Khalid

    First of all, thank you for sharing us your knowledge in this topic, I’ve been trying for a while now to follow your foot steps in order to create a parser for a programming language in Java, even though I’ve read some where (correct me if I’m wring) that LR grammar is more appropriate for parsing instructions of a programming languages, which lead me to the my question\request. What are the recommended references\textbooks to dig more on this issue? I’ve found some, yet it always drive may away.
    Thanks again.

    Reply
  2. mikail

    If you are going to write a parser for a programming language then you should definitely go for an LL or LALR grammar. You should also not write the parser yourself but use a parser generator. ANTLR is probably the most popular parser generator. They also have quite good documentation available.

    Reply
  3. Hồ Thúc Đồng

    Thanks for your post, very useful to me.

    Reply
  4. Nick Anderson

    Thanks for your posts so much! I’m a freshman in Uni and these are very helpful. Usually with programming tutorials I end up writing code that I have no idea how it works, but I learn so much from these tutorials.

    Reply
  5. Niyas C

    Thanks for sharing your knowledge.
    Let me to ask one simple doubt.
    Don’t you think that, it is better to create a class ‘Binary’ for all binary operations instead of creating seperate class for additon/substraction/mul/div/mod ..etc?

    Reply
  6. Operation

    Cogito – Home

    Reply
  7. 2017

    Brian Eno | Biography, Albums, Streaming Links | AllMusic

    Reply
  8. Exemplos

    Survivalist Forum Survival Gear SHTF and TEOTWAWKI …

    Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>