Remember from the post in which we implemented the parser that every non-terminal symbol in the grammar was given a method inside the **Parser** class. When we defined these methods they were plain void methods that did not take any arguments. This is about to change. We will now modify all these methods to return an **ExpressionNode**.

private ExpressionNode expression() ... private ExpressionNode sumOp(ExpressionNode expr) ... private ExpressionNode signedTerm() ... private ExpressionNode term() ... private ExpressionNode termOp(ExpressionNode expr) ... private ExpressionNode signedFactor() ... private ExpressionNode factor() ... private ExpressionNode factorOp(ExpressionNode expr) ... private ExpressionNode argument() ... private ExpressionNode value() ...

Inside each of these methods we now have to create the expression nodes that correspond to the parsed expression and return the result. Let’s start with the leaf nodes which are parsed inside **value()**.

private ExpressionNode value() { // argument -> NUMBER if (lookahead.token == Token.NUMBER) { ExpressionNode expr = new ConstantExpressionNode(lookahead.sequence); nextToken(); return expr; } // argument -> VARIABLE if (lookahead.token == Token.VARIABLE) { ExpressionNode expr = new VariableExpressionNode(lookahead.sequence); nextToken(); return expr; } if (lookahead.token == Token.EPSILON) throw new ParserException("Unexpected end of input"); else throw new ParserException("Unexpected symbol %s found", lookahead); }

If we encounter a number then we create a new **ConstantExpressionNode**. We can use **lookahead.sequence** to initialise the constant. **lookahead.sequence** contains the string that corresponds to the current token. In case of a number, it will contain the string representation of the numerical value. **ConstantExpressionNode** had a convenience constructor that takes a string, converts it into a number and stores it internally as the value of the constant.

The code added for variables is almost identical except that we create a new **VariableExpressionNode**. Note how, in both cases, we create the node before calling **nextToken()**. We do this because **nextToken()** modifies the lookahead which would mean we loose the information about the current token.

Next up is the argument() method.

private ExpressionNode argument() { // argument -> FUNCTION argument if (lookahead.token == Token.FUNCTION) { int function = FunctionExpressionNode.stringToFunction(lookahead.sequence); nextToken(); ExpressionNode expr = argument(); return new FunctionExpressionNode(function, expr); } // argument -> OPEN_BRACKET sum CLOSE_BRACKET else if (lookahead.token == Token.OPEN_BRACKET) { nextToken(); ExpressionNode expr = expression(); if (lookahead.token != Token.CLOSE_BRACKET) throw new ParserException("Closing brackets expected", lookahead); nextToken(); return expr; } // argument -> value return value(); }

The argument non-terminal does not actually correspond to any type of node in the expression tree. All we need to do here is pass on the nodes we obtain from the sub-expressions. So, depending on the rule, we either return a terminal node via the **value()** method or a more complex sub-tree via the **expression()** method or we create a **FunctionExpressionNode** if we encounter a function.

If we find a function then we have to create a **FunctionExpressionNode**. To do this we first have to convert the string in** lookahead.sequence** to a function identifier. In the last post we defined a static helper method called **stringToFunction()** that can do this for us. This function will throw an exception if the string does not correspond to a supported function.

The factor() method produces argument non-terminal optionally wrapped by **functionOp**.

private ExpressionNode factor() { // factor -> argument factor_op ExpressionNode a = argument(); return factorOp(a); }

The rule for **factor** produces and **argument** and a **factor_op** non-terminal. The **factor_op** non-terminal can include an exponentiation, or it might not. In order for **factor_op** to create the exponentiation it needs to know the base which is parsed by **argument()**. For this reason we decide to pass the node produced by **argument()** to the **factorOp()** method.

private ExpressionNode factorOp(ExpressionNode expression) { // factor_op -> RAISED factor if (lookahead.token == Token.RAISED) { nextToken(); ExpressionNode exponent = signedFactor(); return new ExponentiationExpressionNode(expression, exponent); } // factor_op -> EPSILON return expression; }

If we find a **RAISED** token then **factorOp()** creates an **ExponentiationExpressionNode** with the base given by the expression passed to the the method and the exponent which is parsed from a recursive call to **factor()**. If the **EPSILON** rule is applied then we simply return expression.

We have called the **signedFactor()** method for the exponent which corresponds to the **signed_factor** non-terminal.

private ExpressionNode signedFactor() { // signed_factor -> PLUSMINUS factor if (lookahead.token == Token.PLUSMINUS) { boolean positive = lookahead.sequence.equals("+"); nextToken(); ExpressionNode t = factor(); if (positive) return t; else return new AdditionExpressionNode(t, false); } // signed_factor -> factor return factor(); }

A signed factor can start with a plus or minus sign. In the case of a minus sign, an **AdditionExpressionNode** is generated to represent the negative of the factor. In all other cases the expression from **factor()** is simply passed on.

The term non-terminal again does not correspond to any node in the expression tree. The implementation of **term()**, therefore, just passes on the expression nodes.

private ExpressionNode term() { // term -> factor term_op ExpressionNode f = factor(); return termOp(f); }

Again, **termOp()** might or might not produce a product, depending on which symbols are found next. In order for **termOp()** to be able to create a product including the first factor, we pass the factor to **termOp()**.

private ExpressionNode termOp(ExpressionNode expression) { // term_op -> MULTDIV factor term_op if (lookahead.token == Token.MULTDIV) { MultiplicationExpressionNode prod; if (expression.getType() == ExpressionNode.MULTIPLICATION_NODE) prod = (MultiplicationExpressionNode)expression; else prod = new MultiplicationExpressionNode(expression, true); boolean positive = lookahead.sequence.equals("*"); nextToken(); ExpressionNode f = signedFactor(); prod.add(f, positive); return termOp(prod); } // term_op -> EPSILON return expression; }

The implementation of **termOp()** is slightly more complicated. Remember from the last post that we can add an arbitrary number of factors to a **MultiplicationExpressionNode**. This means, as we continue calling **termOp()** recursively we want to keep adding factors to the multiplication. When we discover a **MULTDIV** token we know that we are dealing with a product or division. We create a new **MultiplicationExpressionNode** only if the node that was passed to us is not already a **MultiplicationExpressionNode**. Then we add the next factor to the multiplication.

The **EPSILON** rule, on the other hand, simply returns the expression that was passed to **termOp()**.

A generalisation of **term** is **signed_term**. This can include a leading plus or minus symbol.

private ExpressionNode signedTerm() { // signed_term -> PLUSMINUS term if (lookahead.token == Token.PLUSMINUS) { boolean positive = lookahead.sequence.equals("+"); nextToken(); ExpressionNode t = term(); if (positive) return t; else return new AdditionExpressionNode(t, false); } // signed_term -> term return term(); }

Only in the case that the leading symbol is a minus, we have to create an **AdditionExpressionNode** and add the term with a negative sign. In all other cases we simply pass on the node obtained from **term()**.

The methods **expression()** and **sumOp()** follow pretty much the same pattern as **term()** and **termOp()**.

private ExpressionNode expression() { // expression -> signed_term sum_op ExpressionNode expr = signedTerm(); return sumOp(expr); }

The method **expression()** does not produce any nodes itself but passes the result of **signedTerm()** to **sumOp()** which can then assemble and **AdditionExpressionNode** if it has to.

private ExpressionNode sumOp(ExpressionNode expr) { // sum_op -> PLUSMINUS term sum_op if (lookahead.token == Token.PLUSMINUS) { AdditionExpressionNode sum; if (expr.getType() == ExpressionNode.ADDITION_NODE) sum = (AdditionExpressionNode)expr; else sum = new AdditionExpressionNode(expr, true); boolean positive = lookahead.sequence.equals("+"); nextToken(); ExpressionNode t = term(); sum.add(t, positive); return sumOp(sum); } // sum_op -> EPSILON return expr; }

As with **termOp()** we continue calling **sumOp()** recursively we want to keep adding terms to the sum. When we discover a **PLUSMINUS** token we know that we are dealing with an addition or subtraction. We create a new **AdditionExpressionNode** only if the node that was passed to us is not already a **AdditionExpressionNode**. Then we add the next term to the sum.

The **EPSILON** rule, on the other hand, simply returns the expression that was passed to **sumOp()**.

Finally we modify the parse() method to return the ExpressionNode created in the process of parsing the input.

public ExpressionNode parse(LinkedList<Token> tokens) { this.tokens = (LinkedList<Token>)tokens.clone(); lookahead = this.tokens.getFirst(); ExpressionNode expr = expression(); if (lookahead.token != Token.EPSILON) throw new ParserException("Unexpected symbol %s found", lookahead); return expr; }

With all the above we are ready to use the parser. In our main() method we can now test the following.

Parser parser = new Parser(); try { ExpressionNode expression = parser.parse("3*2^4 + sqrt(1+3)"); System.out.println("The value of the expression is "+expression.getValue()); } catch (ParserException e) { System.out.println(e.getMessage()); } catch (EvaluationException e) { System.out.println(e.getMessage()); }

This will produce the following output.

The value of the expression is 50.0

I hope you feel the same of exhilaration when seeing this result. It just feels like everything finally came together and the parser works exactly as expected.

Unfortunately we are not quite done. While we are able to parse variables and named constants, we have not yet got a mechanism for setting the values of those constants. This will be the topic of the next and final post in this series. To make thins a bit more interesting, I will introduce the visitor design pattern to achieve this.

]]>The last post in this series showed you how a mathematical expression parser was implemented from the grammar that we designed earlier. This was all fine and good but the parser did not do much except confirm if a mathematical expression was valid or not. In the end there was no way of evaluating that expression because we never stored any of the information that the parser was able to retrieve from the input. We will have to do this by adding side effects in the parser’s methods, but before we do we have to decide how our mathematical expressions should be stored. That is the topic of this post.

The typical way of storing expressions is in a tree structure. Each node in the tree represents either an operation or a value. The outermost operation (the one that is performed last in the evaluation of the expression) is at the root of the tree and the values are stored in the leaf node. The figure on the right shows the tree for the expression **3*2^4 + sqrt(1+3)**. The outermost operation is the **+** which adds **3*2^4** and **sqrt(1+3)**. Evaluating the expression is done from the leaves upward. In our example we can start on the left side and evaluate 2^4 and replace the exponentiation with the result, 16. The resulting tree is shown below.

We continue in the same way with the **3*16** nodes and replace the multiplication with the result **48**. On the right hand side of the tree we can evaluate the addition **1+3** and replace it with the sum **4**. Then we evaluate **sqrt(4)** which results **2**. The sqrt node is replaced and the tree, after all these operations is shown in the figure below. The final result of the expression is obtained from adding the two numbers which then gives **50** as the answer.

To store the tree in some kind of data structure we need to cast it into some sort of object hierarchy. We will create a base interface for the nodes of the tree and a number of classes that implement the interface for different types of nodes, that is one class for additions, one class for multiplications and so on. The interface is called **ExpressionNode**.

public interface ExpressionNode { public static final int VARIABLE_NODE = 1; public static final int CONSTANT_NODE = 2; public static final int ADDITION_NODE = 3; public static final int MULTIPLICATION_NODE = 4; public static final int EXPONENTIATION_NODE = 5; public static final int FUNCTION_NODE = 6; public int getType(); public double getValue(); }

ExpressionNode defines a couple of static integer constants that can be used to identify different node types. The abstract method getType() is intended to return the type of the node, that is every derived class should return one of the above constants. The method getValue() should return the evaluated result of the sub-tree which is rooted in the current node. In the example above, the node corresponding to the square root should return 2.

We now implement this interface and create a class called ConstantExpressionNode for holding numeric constants, that is numbers like 2 or 3.14159.

public class ConstantExpressionNode implements ExpressionNode { private double value; public ConstantExpressionNode(double value) { this.value = value; } public ConstantExpressionNode(String value) { this.value = Double.valueOf(value); } public double getValue() { return value; } public int getType() { return ExpressionNode.CONSTANT_NODE; } }

The method **getType()** returns **ExpressionNode.CONSTANT_NODE** and **getValue()** returns the value stored in a private member of type double. There are two constructors, one will take a double value of the constant and the other will take a which is converted to the double value.

The expression node that holds variables or named constant is called VariableExpressionNode.

public class VariableExpressionNode implements ExpressionNode { private String name; private double value; private boolean valueSet; public VariableExpressionNode(String name) { this.name = name; valueSet = false; } public int getType() { return ExpressionNode.VARIABLE_NODE; } public void setValue(double value) { this.value = value; this.valueSet = true; } public double getValue() { if (valueSet) return value; else throw new EvaluationException("Variable '" + name + "' was not initialized."); } }

**VariableExpressionNode** has an internal parameter, **name**, that stores the name of the variable along with a double, **value**, that stores the value of the variable if it is set. A boolean, **valueSet**, indicates if the value has been set. The **getValue()** method will return the value or throw an exception, depending on the state of **valueSet**.

Next we want to create a class that captures additions. Instead of holding just two terms which are added, we decide that an addition should hold an arbitrary number of terms which are either added or subtracted. We will have a similar requirement for multiplications and we will separate out the commonalities in a class called **SequenceExpressionNode**. First we define an inner class called **Term** which holds a single term of the sum together with a boolean indicating the sign.

public class Term { public boolean positive; public ExpressionNode expression; public Term(boolean positive, ExpressionNode expression) { super(); this.positive = positive; this.expression = expression; } }

This class is used by the **SequenceExpressionNode**. Note how the class is declared abstract because we are not yet implementing the methods of the **ExpressionNode** interface.

public abstract class SequenceExpressionNode implements ExpressionNode { protected LinkedList<Term> terms; public SequenceExpressionNode() { this.terms = new LinkedList<Term>(); } public SequenceExpressionNode(ExpressionNode a, boolean positive) { this.terms = new LinkedList<Term>(); this.terms.add(new Term(positive, a)); } public void add(ExpressionNode a, boolean positive) { this.terms.add(new Term(positive, a)); } }

The linked list **terms** contains all the terms of the sum or product. Two constructors are available, one creates an empty sequence, the other immediately adds the first term into the sequence. The second constructor is a there for convenience so that sequences can be created easily. The add() method allows terms to be added to the sum or product. With this class defined, we can now go ahead and write the **AdditionExpressionNode** class.

public class AdditionExpressionNode extends SequenceExpressionNode { public AdditionExpressionNode() { super(); } public AdditionExpressionNode(ExpressionNode a, boolean positive) { super(a, positive); } public int getType() { return ExpressionNode.ADDITION_NODE; } public double getValue() { double sum = 0.0; for (Term t : terms) { if (t.positive) sum += t.expression.getValue(); else sum -= t.expression.getValue(); } return sum; } }

The constructors and the **getType()** method are implemented as expected. **getValue()** iterates over the terms and adds or subtracts them from the sum, depending of the **positive** flag inside the terms. In order to get the values of the sub-expressions that are added and subtracted, we can recursively call the **getValue()** methods of all the terms.

The **MultiplicationExpressionNode** is very similar.

public class MultiplicationExpressionNode extends SequenceExpressionNode { public MultiplicationExpressionNode() { super(); } public MultiplicationExpressionNode(ExpressionNode a, boolean positive) { super(a, positive); } public int getType() { return ExpressionNode.MULTIPLICATION_NODE; } public double getValue() { double prod = 1.0; for (Term t : terms) { if (t.positive) prod *= t.expression.getValue(); else prod /= t.expression.getValue(); } return prod; } }

The only new thing here is the **getValue()** method. Instead of adding or subtracting terms from a sum that is initially zero we now multiply or divide from a product. The product has to be initialised to 1.0 instead of 0.0 like the sum in the **AdditionExpressionNode**. In contrast to the previous two operations, exponentiation only takes two arguments, the base and the exponent. Accordingly, the ExponentiationExpressionNode looks slightly simpler.

public class ExponentiationExpressionNode implements ExpressionNode { private ExpressionNode base; private ExpressionNode exponent; public ExponentiationExpressionNode(ExpressionNode base, ExpressionNode exponent) { this.base = base; this.exponent = exponent; } public int getType() { return ExpressionNode.EXPONENTIATION_NODE; } public double getValue() { return Math.pow(base.getValue(), exponent.getValue()); } }

The constructor assigns the private base and exponent expressions and **getValue()** uses Math.pow() to calculate the result of the exponentiation. Again, because the base and the exponent are both expression nodes themselves, we call their **getValue()** methods to obtain a value.

The last type of expression node that needs implementing will handle mathematical functions. This is the only class which will be slightly more laborious because we have to use control statements to tell the different functions apart.

public class FunctionExpressionNode implements ExpressionNode { public static final int SIN = 1; public static final int COS = 2; public static final int TAN = 3; public static final int ASIN = 4; public static final int ACOS = 5; public static final int ATAN = 6; public static final int SQRT = 7; public static final int EXP = 8; public static final int LN = 9; public static final int LOG = 10; public static final int LOG2 = 11; private int function; private ExpressionNode argument; public FunctionExpressionNode(int function, ExpressionNode argument) { super(); this.function = function; this.argument = argument; } public int getType() { return ExpressionNode.FUNCTION_NODE; } public double getValue() { switch (function) { case SIN: return Math.sin(argument.getValue()); case COS: return Math.cos(argument.getValue()); case TAN: return Math.tan(argument.getValue()); case ASIN: return Math.asin(argument.getValue()); case ACOS: return Math.acos(argument.getValue()); case ATAN: return Math.atan(argument.getValue()); case SQRT: return Math.sqrt(argument.getValue()); case EXP: return Math.exp(argument.getValue()); case LN: return Math.log(argument.getValue()); case LOG: return Math.log(argument.getValue()) * 0.43429448190325182765; case LOG2: return Math.log(argument.getValue()) * 1.442695040888963407360; } throw new EvaluationException("Invalid function id "+function+"!"); } public static int stringToFunction(String str) { if (str.equals("sin")) return FunctionExpressionNode.SIN; if (str.equals("cos")) return FunctionExpressionNode.COS; if (str.equals("tan")) return FunctionExpressionNode.TAN; if (str.equals("asin")) return FunctionExpressionNode.ASIN; if (str.equals("acos")) return FunctionExpressionNode.ACOS; if (str.equals("atan")) return FunctionExpressionNode.ATAN; if (str.equals("sqrt")) return FunctionExpressionNode.SQRT; if (str.equals("exp")) return FunctionExpressionNode.EXP; if (str.equals("ln")) return FunctionExpressionNode.LN; if (str.equals("log"))return FunctionExpressionNode.LOG; if (str.equals("log2")) return FunctionExpressionNode.LOG2; throw ParseException("Unexpected Function "+str+" found!"); } public static String getAllFunctions() { return "sin|cos|tan|asin|acos|atan|sqrt|exp|ln|log|log2"; } }

We first define a series of integer constants for the different mathematical functions available. The private member **function** will be set to one of these pre-specified values. The expression node **argument** holds the argument of the function. The method **getValue()** decides, in a switch statement, which function to apply to the argument.

Two static methods are provided to help tokenizing and parsing an expression from a String. The method **stringToFunction()** will take a string of a function name and return the corresponding integer function identifier. If the method is passed a string that is not a recognised function then an exception is thrown. This method will be used by the parser to create new **FunctionExpressionNode** objects. The method **getAllFunctions()** returns a string containing all supported functions, separated by a |. This string may be used by the tokenizer to create a regular expression that identifies functions inside a string.

These are all the classes needed to represent mathematical expressions. But for now we have not created the code necessary to create the expression tree when parsing an expression from a string. This means that, for now, if we want to test our code we will have to create the tree by hand. The code for creating the expression **3*2^4 + sqrt(1+3)** is given below.

AdditionExpressionNode innerSum = new AdditionExpressionNode(); innerSum.add(new ConstantExpressionNode(1), true); innerSum.add(new ConstantExpressionNode(3), true); ExpressionNode sqrt = new FunctionExpressionNode( FunctionExpressionNode.SQRT, innerSum); ExpressionNode expo = new ExponentiationExpressionNode( new ConstantExpressionNode(2), new ConstantExpressionNode(4)); MultiplicationExpressionNode prod = new MultiplicationExpressionNode(); prod.add(new ConstantExpressionNode(3), true); prod.add(expo, true); AdditionExpressionNode expression = new AdditionExpressionNode(); expression.add(prod, true); expression.add(sqrt, true); System.out.println("The result is " + expression.getValue());

If we have done everything right then this code will print out

The result is 50.0

In the next contribution to this series I will modify the parser to generate the expression tree while parsing the input.

]]>