In the last post I created classes that can represent a mathematical expression in the form of a tree data structure. We saw how expressions could be stored and evaluated but the code was not linked to the parser yet. The only way up to now to create an expression tree structure is to assemble it by hand. This is tedious and, of course, is not what we want from a parser. In this post I will extend the parser presented in a previous post to include code that will generate an expression tree on the fly.

Remember from the post in which we implemented the parser that every non-terminal symbol in the grammar was given a method inside the **Parser** class. When we defined these methods they were plain void methods that did not take any arguments. This is about to change. We will now modify all these methods to return an **ExpressionNode**.

private ExpressionNode expression() ... private ExpressionNode sumOp(ExpressionNode expr) ... private ExpressionNode signedTerm() ... private ExpressionNode term() ... private ExpressionNode termOp(ExpressionNode expr) ... private ExpressionNode signedFactor() ... private ExpressionNode factor() ... private ExpressionNode factorOp(ExpressionNode expr) ... private ExpressionNode argument() ... private ExpressionNode value() ...

Inside each of these methods we now have to create the expression nodes that correspond to the parsed expression and return the result. Let’s start with the leaf nodes which are parsed inside **value()**.

private ExpressionNode value() { // argument -> NUMBER if (lookahead.token == Token.NUMBER) { ExpressionNode expr = new ConstantExpressionNode(lookahead.sequence); nextToken(); return expr; } // argument -> VARIABLE if (lookahead.token == Token.VARIABLE) { ExpressionNode expr = new VariableExpressionNode(lookahead.sequence); nextToken(); return expr; } if (lookahead.token == Token.EPSILON) throw new ParserException("Unexpected end of input"); else throw new ParserException("Unexpected symbol %s found", lookahead); }

If we encounter a number then we create a new **ConstantExpressionNode**. We can use **lookahead.sequence** to initialise the constant. **lookahead.sequence** contains the string that corresponds to the current token. In case of a number, it will contain the string representation of the numerical value. **ConstantExpressionNode** had a convenience constructor that takes a string, converts it into a number and stores it internally as the value of the constant.

The code added for variables is almost identical except that we create a new **VariableExpressionNode**. Note how, in both cases, we create the node before calling **nextToken()**. We do this because **nextToken()** modifies the lookahead which would mean we loose the information about the current token.

Next up is the argument() method.

private ExpressionNode argument() { // argument -> FUNCTION argument if (lookahead.token == Token.FUNCTION) { int function = FunctionExpressionNode.stringToFunction(lookahead.sequence); nextToken(); ExpressionNode expr = argument(); return new FunctionExpressionNode(function, expr); } // argument -> OPEN_BRACKET sum CLOSE_BRACKET else if (lookahead.token == Token.OPEN_BRACKET) { nextToken(); ExpressionNode expr = expression(); if (lookahead.token != Token.CLOSE_BRACKET) throw new ParserException("Closing brackets expected", lookahead); nextToken(); return expr; } // argument -> value return value(); }

The argument non-terminal does not actually correspond to any type of node in the expression tree. All we need to do here is pass on the nodes we obtain from the sub-expressions. So, depending on the rule, we either return a terminal node via the **value()** method or a more complex sub-tree via the **expression()** method or we create a **FunctionExpressionNode** if we encounter a function.

If we find a function then we have to create a **FunctionExpressionNode**. To do this we first have to convert the string in** lookahead.sequence** to a function identifier. In the last post we defined a static helper method called **stringToFunction()** that can do this for us. This function will throw an exception if the string does not correspond to a supported function.

The factor() method produces argument non-terminal optionally wrapped by **functionOp**.

private ExpressionNode factor() { // factor -> argument factor_op ExpressionNode a = argument(); return factorOp(a); }

The rule for **factor** produces and **argument** and a **factor_op** non-terminal. The **factor_op** non-terminal can include an exponentiation, or it might not. In order for **factor_op** to create the exponentiation it needs to know the base which is parsed by **argument()**. For this reason we decide to pass the node produced by **argument()** to the **factorOp()** method.

private ExpressionNode factorOp(ExpressionNode expression) { // factor_op -> RAISED factor if (lookahead.token == Token.RAISED) { nextToken(); ExpressionNode exponent = signedFactor(); return new ExponentiationExpressionNode(expression, exponent); } // factor_op -> EPSILON return expression; }

If we find a **RAISED** token then **factorOp()** creates an **ExponentiationExpressionNode** with the base given by the expression passed to the the method and the exponent which is parsed from a recursive call to **factor()**. If the **EPSILON** rule is applied then we simply return expression.

We have called the **signedFactor()** method for the exponent which corresponds to the **signed_factor** non-terminal.

private ExpressionNode signedFactor() { // signed_factor -> PLUSMINUS factor if (lookahead.token == Token.PLUSMINUS) { boolean positive = lookahead.sequence.equals("+"); nextToken(); ExpressionNode t = factor(); if (positive) return t; else return new AdditionExpressionNode(t, false); } // signed_factor -> factor return factor(); }

A signed factor can start with a plus or minus sign. In the case of a minus sign, an **AdditionExpressionNode** is generated to represent the negative of the factor. In all other cases the expression from **factor()** is simply passed on.

The term non-terminal again does not correspond to any node in the expression tree. The implementation of **term()**, therefore, just passes on the expression nodes.

private ExpressionNode term() { // term -> factor term_op ExpressionNode f = factor(); return termOp(f); }

Again, **termOp()** might or might not produce a product, depending on which symbols are found next. In order for **termOp()** to be able to create a product including the first factor, we pass the factor to **termOp()**.

private ExpressionNode termOp(ExpressionNode expression) { // term_op -> MULTDIV factor term_op if (lookahead.token == Token.MULTDIV) { MultiplicationExpressionNode prod; if (expression.getType() == ExpressionNode.MULTIPLICATION_NODE) prod = (MultiplicationExpressionNode)expression; else prod = new MultiplicationExpressionNode(expression, true); boolean positive = lookahead.sequence.equals("*"); nextToken(); ExpressionNode f = signedFactor(); prod.add(f, positive); return termOp(prod); } // term_op -> EPSILON return expression; }

The implementation of **termOp()** is slightly more complicated. Remember from the last post that we can add an arbitrary number of factors to a **MultiplicationExpressionNode**. This means, as we continue calling **termOp()** recursively we want to keep adding factors to the multiplication. When we discover a **MULTDIV** token we know that we are dealing with a product or division. We create a new **MultiplicationExpressionNode** only if the node that was passed to us is not already a **MultiplicationExpressionNode**. Then we add the next factor to the multiplication.

The **EPSILON** rule, on the other hand, simply returns the expression that was passed to **termOp()**.

A generalisation of **term** is **signed_term**. This can include a leading plus or minus symbol.

private ExpressionNode signedTerm() { // signed_term -> PLUSMINUS term if (lookahead.token == Token.PLUSMINUS) { boolean positive = lookahead.sequence.equals("+"); nextToken(); ExpressionNode t = term(); if (positive) return t; else return new AdditionExpressionNode(t, false); } // signed_term -> term return term(); }

Only in the case that the leading symbol is a minus, we have to create an **AdditionExpressionNode** and add the term with a negative sign. In all other cases we simply pass on the node obtained from **term()**.

The methods **expression()** and **sumOp()** follow pretty much the same pattern as **term()** and **termOp()**.

private ExpressionNode expression() { // expression -> signed_term sum_op ExpressionNode expr = signedTerm(); return sumOp(expr); }

The method **expression()** does not produce any nodes itself but passes the result of **signedTerm()** to **sumOp()** which can then assemble and **AdditionExpressionNode** if it has to.

private ExpressionNode sumOp(ExpressionNode expr) { // sum_op -> PLUSMINUS term sum_op if (lookahead.token == Token.PLUSMINUS) { AdditionExpressionNode sum; if (expr.getType() == ExpressionNode.ADDITION_NODE) sum = (AdditionExpressionNode)expr; else sum = new AdditionExpressionNode(expr, true); boolean positive = lookahead.sequence.equals("+"); nextToken(); ExpressionNode t = term(); sum.add(t, positive); return sumOp(sum); } // sum_op -> EPSILON return expr; }

As with **termOp()** we continue calling **sumOp()** recursively we want to keep adding terms to the sum. When we discover a **PLUSMINUS** token we know that we are dealing with an addition or subtraction. We create a new **AdditionExpressionNode** only if the node that was passed to us is not already a **AdditionExpressionNode**. Then we add the next term to the sum.

The **EPSILON** rule, on the other hand, simply returns the expression that was passed to **sumOp()**.

Finally we modify the parse() method to return the ExpressionNode created in the process of parsing the input.

public ExpressionNode parse(LinkedList<Token> tokens) { this.tokens = (LinkedList<Token>)tokens.clone(); lookahead = this.tokens.getFirst(); ExpressionNode expr = expression(); if (lookahead.token != Token.EPSILON) throw new ParserException("Unexpected symbol %s found", lookahead); return expr; }

### Using the parser

With all the above we are ready to use the parser. In our main() method we can now test the following.

Parser parser = new Parser(); try { ExpressionNode expression = parser.parse("3*2^4 + sqrt(1+3)"); System.out.println("The value of the expression is "+expression.getValue()); } catch (ParserException e) { System.out.println(e.getMessage()); } catch (EvaluationException e) { System.out.println(e.getMessage()); }

This will produce the following output.

The value of the expression is 50.0

I hope you feel the same of exhilaration when seeing this result. It just feels like everything finally came together and the parser works exactly as expected.

Unfortunately we are not quite done. While we are able to parse variables and named constants, we have not yet got a mechanism for setting the values of those constants. This will be the topic of the next and final post in this series. To make thins a bit more interesting, I will introduce the visitor design pattern to achieve this.

NicoI hate to point out little typos, especially in an article as well written and informative as this series, but you do have some inconsistency in naming, even in your code samples, such as the method sum_op() which calls sumOp() recursively. But apart from that, thanks for this amazing amount of information, its been really interesting to follow along (in haxe, for me, but same thing).

lzshadsNico, try looking the article where the parser is implemented, now there’s suport to raised operation.

NicoAlso, you call: ExpressionNode expression = parser.parse(“3*2^4 + sqrt(1+3)”);

while parse() takes a list of tokens, and the tokenizer hasnt been updated for the new things such as RAISED, so even passing tokenizer.getTokens will still fail. I’ll check your next article to see if you talk about that, otherwise I’ll just read your source, as it seems some parts are missing from this article

RobinIt’s because when you add the tokens in the tokenizer, the token numbers are not the same as in the Token class.