`TermStructure`

that defines methods common to any kind of term structure. It was pointed out that many classes inherit from the `TermStructure`

class, including a class called `YieldTermStructure`

. This class is, again, an abstract class and it defines the base for any yield curves. A yield curve is defined by a list of quotes together with a list of jump dates. These are passed to `YieldTermStructure`

in the constructor and are then stored internally. There are three different constructors which correspond to the three constructors defined in `TermStructure`

.
YieldTermStructure( const DayCounter& dc = DayCounter(), const std::vector<Handle<Quote> >& jumps = std::vector<Handle<Quote> >(), const std::vector<Date>& jumpDates = std::vector<Date>()); YieldTermStructure( const Date& referenceDate, const Calendar& cal = Calendar(), const DayCounter& dc = DayCounter(), const std::vector<Handle<Quote> >& jumps = std::vector<Handle<Quote> >(), const std::vector<Date>& jumpDates = std::vector<Date>()); YieldTermStructure(Natural settlementDays, const Calendar& cal, const DayCounter& dc = DayCounter(), const std::vector<Handle<Quote> >& jumps = std::vector<Handle<Quote> >(), const std::vector<Date>& jumpDates = std::vector<Date>());

All three constructors take two additional arguments, a `std::vector`

of `Quote`

handles and a `std::vector`

of jump dates. These two vectors define the data for the interest rate term structure. If the `jumpDates`

array is left empty, the jumps are assumed to occur annually at 31 Dec every year, starting in the year of the reference date. For the meaning of the remaining arguments and the difference between the constructors, see the documentation of the TermStructure class.

Two inspector methods give access to the jump dates.

const std::vector<Date>& jumpDates() const; const std::vector<Time>& jumpTimes() const;

The method `jumpDates`

returns vector of Date objects while `jumpTimes`

returns a vector of times. Recall that `Time`

is just a `Real`

number. Times are expressed in fractions of a year from the reference date and are evaluated using the day counter. They are calculated using the `timeFromReference`

method of `TermStructure`

.

`YieldTermStructure`

also overrides the `update`

method that it inherited from the Observer interface. Whenever the reference date changes, the jump times have to be recalculated.

Two methods return the discount factor from an arbitrary date to the reference date.

DiscountFactor discount(const Date& d, bool extrapolate = false) const; DiscountFactor discount(Time t, bool extrapolate = false) const;

The two methods differ in the way that the date is passed to them. The first version takes a date, while the second version takes a time measured in fractions of a year. This time should be calculated using the same day counting convention that is used in the `YieldTermStructure`

. In fact, the first method is just a convenience method which converts the `Date`

into `Time`

using `TermStrucure::timeFromReference`

and then calls the second version of the `discount`

method.

The discount is calculated by multiplying all the jumps that fall in the period from the reference date to the date passed to the discount() method and then multiplying this with the result of `discountImpl`

, discussed later. The `extrapolate`

flag indicates whether the discount factor should be extrapolated if it falls outside of the date range of the term structure.

The `discount`

method is used internally in a number of convenience methods. `zeroRate`

calculates the implied zero-coupon interest rate for a given time interval.

InterestRate zeroRate(const Date& d, const DayCounter& resultDayCounter, Compounding comp, Frequency freq = Annual, bool extrapolate = false) const; InterestRate zeroRate(Time t, Compounding comp, Frequency freq = Annual, bool extrapolate = false) const;

As before, the two methods differ in the way that the date is passed to them. The first version takes a specific date. This version allows passing a different day counter than the one used in the term structure. The `InterestRate`

that is returned uses the new day counter, compounding rule and frequency. The second version takes a time interval which has to be calculated using the same day counter as the one used in `YieldTermStructure`

. Internally, both methods first calculate the compound rate as the reciprocal of the discount rate, *a=1/d*. Then an `InterestRate`

is generated by calling `InterestRate::impliedRate()`

with the correct arguments. The extrapolate flag indicates whether the discount factor should be extrapolated for dates outside of the term structure’s range.

The implied forward rates can be obtained in a similar way.

InterestRate forwardRate(const Date& d1, const Date& d2, const DayCounter& resultDayCounter, Compounding comp, Frequency freq = Annual, bool extrapolate = false) const; InterestRate forwardRate(const Date& d, const Period& p, const DayCounter& resultDayCounter, Compounding comp, Frequency freq = Annual, bool extrapolate = false) const; InterestRate forwardRate(Time t1, Time t2, Compounding comp, Frequency freq = Annual, bool extrapolate = false) const;

For the forward rate there are three methods available, each taking the time interval in a different form. It can be given by two dates, by a date and a time period, or by two floating point numbers representing the time intervals. The behaviour is equivalent to the `zeroRate`

methods. For the first two methods, a separate day counter can be specified for the resulting `InterestRate`

. For the last version, the day counter of the term structure is used.

The forward rate is evaluated by calculating a compound rate using the ratio of the two discount rates, *a=d _{1}/d_{2}*. The interest rate is then generated by calling

`InterestRate::impliedRate()`

with the compound rate.As promised above, I have to now talk about the actual implementation of the discount factor calculation. This should be handled in the method `discountImpl`

which is declared abstract.

virtual DiscountFactor discountImpl(Time) const = 0;

Any concrete implementation of `YieldTermStructure`

must implement this method and return the discount factor for a specific time. When discountImpl() is called, the conversions from dates to units of time has already been done. The `enabled`

flag that was passed to the `discount`

method has been checked. This means that `discountImpl`

must assume that the discount factor has to be extrapolated if the time does not fall within the time for which the term structure can return values, i.e. if it is larger than the value returned by `maxTime`

.

Follow the author

]]>InterestRate(); InterestRate(Rate r, const DayCounter& dc, Compounding comp, Frequency freq);

The first constructor is the default constructor that doesn’t take any arguments and creates a null **InterestRate** object. Note that **InterestRate** is immutable. This means that after calling the default constructor there is no way to change the interest rate or any other information within the object. For this reason you will almost always want to call the second constructor which takes the rate, a day counter, a compounding method and a frequency. The rate is the annual interest rate. The day counting convention is needed to convert dates to fractions of a year when calculating discount rates. **Compounding** is an enum which can take the following values

SimpleCompoundedContinuousSimpleThenCompounded

The compounding parameter has an influence on how the compounding factor is calculated. The fourth parameter specifies the compounding frequency using the **Frequency** enum. This enum is further described in the section Periods, Time Units and Frequencies.

A number of self explanatory inspection methods give allow reading of the internal variables.

Rate rate() const; const DayCounter& dayCounter() const; Compounding compounding() const; Frequency frequency() const;

An **InterestRate** object can also be cast directly into the floating point type **Rate** which is the equivalent of calling the **rate()** method.

operator Rate() const;

The method that contains the actual calculations are is **compoundFactor()**. This method calculates the compound factor for a specific time interval

Real compoundFactor(Time t) const; Real compoundFactor(const Date& d1, const Date& d2, const Date& refStart = Date(), const Date& refEnd = Date());

The first version of **compoundFactor()** takes the time interval measured in years, or fractions thereof. The time interval should be calculated using the same day counting conventions as the **InterestRate**. The second method is a convenience method. It takes two dates and, if necessary, two reference dates. It uses the day counter’s **yearFraction()** method to calculate the time interval and then call the first version of **compoundFactor()** with the resulting time interval. See the discussion of the day counter for the meaning of the date and reference date arguments.

Depending on the compounding method, the compound factor ** c** is calculated in the following way. Here

`Simple`

*c* = 1 + *rt*

`Compounded`

*c* = (1+*r/f*)^(*tf*)

`Continuous`

*c* = e^(*rt*)

`SimpleThenCompounded`

for *t*<=1/*f*: *c* = 1 + *rt*

for *t* > 1/*f*: c = (1+*r/f*)^(*tf*)

The discount factor can be calculated using the **discountFactor()** methods.

DiscountFactor discountFactor(Time t) const; DiscountFactor discountFactor(const Date& d1, const Date& d2, const Date& refStart = Date(), const Date& refEnd = Date());

Again, the first version of **discountFactor()** takes a time interval in units of years. The second version takes two dates and, if necessary, two reference dates. The discount factor ** d** is simply the reciprocal of the compound factor,

It is possible to convert an **InterestRate** object to another one with a different frequency, compounding rule and/or day counter. This is done by calculating the interest rate in such a way that the converted **InterestRate** object gives the same compound rate. The result is not unique and depends on the time interval for which the compound rates are chosen. Such a conversion can be made using the** equivalentRate()** method.

InterestRate equivalentRate(Compounding comp, Frequency freq, Time t) const; InterestRate equivalentRate(const DayCounter& resultDC, Compounding comp, Frequency freq, Date d1, Date d2, const Date& refStart = Date(), const Date& refEnd = Date());

The first version uses a specific time interval and returns an equivalent **InterestRate** with a different compounding rule and a different frequency. Note that this version does not allow to change the day counting convention. The second version takes a new day counter as well as compounding rule and a frequency. The equivalence is evaluated for a date interval using the respective day counters to convert the dates into time intervals. In both cases the equivalent rate will result in the identical compounding factor or discount factor when evaluated for the specific time interval. For any other time interval the compounding factors will, most likely, be different.

In many cases, such as the above conversion, one has a compound rate and is looking for an interest rate with a specific day counter, compounding rule and frequency that will produce the known compound rate. This is taken care of by the static method **impliedRate()** provided by the **InterestRate** class.

static InterestRate impliedRate(Real compound, const DayCounter& resultDC, Compounding comp, Frequency freq, Time t); static InterestRate impliedRate(Real compound, const DayCounter& resultDC, Compounding comp, Frequency freq, const Date& d1, const Date& d2, const Date& refStart = Date(), const Date& refEnd = Date());

As with the other methods, there are two versions. The first taking a time interval and the second taking a number of dates. The two static methods will return an object of type **InterestRate**, with a given day counting convention, frequency and compounding rule. The ineterst rate will reproduce the value **compound** when the **compoundRate()** method is called with the same time or date arguments for which the object was created.

So let’s think about what exactly we want to to. Once the expression is parsed and we have an internal representation in the form of an ExpressionNode, we want to iterate through the whole tree and find VariableExpressionNode objects. If the nodes found in this way contain a variable with the name that we want to set, then we will assign the value to it.

The most straightforward way of doing this would be to add a new method to the ExpressionNode interface that each concrete implementation will have to define. This method might have the following signature.

public void setVariable(String name, double value);

The implementation in most of the concrete node types would simply pass the information on to the child nodes by calling their setVariable() method. Only the VariableExpressionNode would compare the name passed in the argument with its own name and set its value accordingly. This is indeed not a bad programming style but it has one disadvantage. Let’s think a bit into the future. There might be other operations that we want to perform on the expression tree. Maybe we want to print out the contents in a particular format or we want to find sub-expressions that match certain criteria. In each case we would have to iterate through the tree and in each case we would have to add a new method to the ExpressionNode interface and all its implementations. As your software grows you might have a multitude of different methods for different tasks defined in this way. Not only is this difficult to maintain because the code for a specific task is always spread over many classes but it also makes is necessary to keep modifying existing code for every new operation on the tree.

Ideally we would like to separate all the code for a specific task into a single class with the ability of adding new functionality without modifying existing code. This can be achieved with the visitor design pattern. In this software design pattern, we create an interface that acts as a visitor to the class hierarchy. It is useful when the class hierarchy can be expected to be relatively stable while the number of operations on the classes can increase over time. The visitor will have one method for each class that is part of the hierarchy.

public interface ExpressionNodeVisitor { public void visit(VariableExpressionNode node); public void visit(ConstantExpressionNode node); public void visit(AdditionExpressionNode node); public void visit(MultiplicationExpressionNode node); public void visit(ExponentiationExpressionNode node); public void visit(FunctionExpressionNode node); }

Next we add a method called accept() the ExpressionNode interface which takes an ExpressionNodeVisitor.

public void accept(ExpressionNodeVisitor visitor);

Now that we have defined the method in the interface we have to define it in every class that implements the interface. For the VariableExpressionNode and the ConstantExpressionNode the implementation is a one-liner

public class VariableExpressionNode implements ExpressionNode { ... public void accept(ExpressionNodeVisitor visitor) { visitor.visit(this); } } public class ConstantExpressionNode implements ExpressionNode { ... public void accept(ExpressionNodeVisitor visitor) { visitor.visit(this); } }

When passing an **ExpressionNodeVisitor** to the **accept()** method, the method will call the appropriate visit method of the visitor class. This is achieved by a combination of overriding and overloading of methods. Don’t be fooled. Even though both of the implementations look the same, they call different **visit()** methods in the **ExpressionNodeVisitor** because the **this** member has a different type in each case.

In order for the visitor to be visited by all the nodes in the tree, the **accept()** methods of the other class implementations will also have to call the **accept()** methods of their children.

public class AdditionExpressionNode extends SequenceExpressionNode { ... public void accept(ExpressionNodeVisitor visitor) { visitor.visit(this); for (Term t: terms) t.expression.accept(visitor); } } public class MultiplicationExpressionNode extends SequenceExpressionNode { ... public void accept(ExpressionNodeVisitor visitor) { visitor.visit(this); for (Term t: terms) t.expression.accept(visitor); } }

The implementations again look identical but have slightly different effects because of the different types of **this**.

Finally the FunctionExpressionNode and the ExponentiationExpressionNode follow a similar pattern

public class FunctionExpressionNode implements ExpressionNode { ... public void accept(ExpressionNodeVisitor visitor) { visitor.visit(this); argument.accept(visitor); } } public class ExponentiationExpressionNode implements ExpressionNode { ... public void accept(ExpressionNodeVisitor visitor) { visitor.visit(this); base.accept(visitor); exponent.accept(visitor); } }

With these few lines of code we have added a very powerful mechanism to or class design. We can now freely create new functionality by implementing the ExpressionNodeVisitor interface.

Let’s get back to the task at hand. We would like to set the value of variables with a given name. To do this we create a new class, SetVariable, that implements ExpressionNodeVisitor.

public class SetVariable implements ExpressionNodeVisitor { private String name; private double value; public SetVariable(String name, double value) { super(); this.name = name; this.value = value; } public void visit(VariableExpressionNode node) { if (node.getName().equals(name)) node.setValue(value); } public void visit(ConstantExpressionNode node) {} public void visit(AdditionExpressionNode node) {} public void visit(MultiplicationExpressionNode node) {} public void visit(ExponentiationExpressionNode node) {} public void visit(FunctionExpressionNode node) {} }

The class contains the name and the value of the variable to be set and these are initialised in the standard way in the constructor. Because we are implementing the ExpressionNodeVisitor we need to define all the corresponding methods. However, as you can see most of these methods are empty and do nothing because we are really only interested in the VariableExpressionNode elements in the tree. Only two lines of code are needed to compare and set the variable. All the iteration through the tree is handled by the accept() methods of the individual classes and in the visitor we can concentrate on writing the code that actually does something.

Using the visitor is also a one-liner. Let’s say we have an ExpressionNode called expression, then the following code will set the value of pi.

expression.accept(new SetVariable("pi", Math.PI));

To test this, let’s update the main method from the last post.

Parser parser = new Parser(); try { ExpressionNode expression = parser.parse("sin(pi/2)"); expression.accept(new SetVariable("pi", Math.PI)); System.out.println("The value of the expression is " + expression.getValue()); } catch (ParserException e) { System.out.println(e.getMessage()); } catch (EvaluationException e) { System.out.println(e.getMessage()); }

As expected, the program prints out

The value of the expression is 1.0

This is concludes our series of the Java expression parser. You can download the complete parser here. It is released under the MIT license, so feel free to use it in your own projects but please keep the copyright notice.

The code reference can be found here.

We have used the parser in our Android app DiffIt! which can calculate derivatives of mathematical expressions.

]]>Remember from the post in which we implemented the parser that every non-terminal symbol in the grammar was given a method inside the **Parser** class. When we defined these methods they were plain void methods that did not take any arguments. This is about to change. We will now modify all these methods to return an **ExpressionNode**.

private ExpressionNode expression() ... private ExpressionNode sumOp(ExpressionNode expr) ... private ExpressionNode signedTerm() ... private ExpressionNode term() ... private ExpressionNode termOp(ExpressionNode expr) ... private ExpressionNode signedFactor() ... private ExpressionNode factor() ... private ExpressionNode factorOp(ExpressionNode expr) ... private ExpressionNode argument() ... private ExpressionNode value() ...

Inside each of these methods we now have to create the expression nodes that correspond to the parsed expression and return the result. Let’s start with the leaf nodes which are parsed inside **value()**.

private ExpressionNode value() { // argument -> NUMBER if (lookahead.token == Token.NUMBER) { ExpressionNode expr = new ConstantExpressionNode(lookahead.sequence); nextToken(); return expr; } // argument -> VARIABLE if (lookahead.token == Token.VARIABLE) { ExpressionNode expr = new VariableExpressionNode(lookahead.sequence); nextToken(); return expr; } if (lookahead.token == Token.EPSILON) throw new ParserException("Unexpected end of input"); else throw new ParserException("Unexpected symbol %s found", lookahead); }

If we encounter a number then we create a new **ConstantExpressionNode**. We can use **lookahead.sequence** to initialise the constant. **lookahead.sequence** contains the string that corresponds to the current token. In case of a number, it will contain the string representation of the numerical value. **ConstantExpressionNode** had a convenience constructor that takes a string, converts it into a number and stores it internally as the value of the constant.

The code added for variables is almost identical except that we create a new **VariableExpressionNode**. Note how, in both cases, we create the node before calling **nextToken()**. We do this because **nextToken()** modifies the lookahead which would mean we loose the information about the current token.

Next up is the argument() method.

private ExpressionNode argument() { // argument -> FUNCTION argument if (lookahead.token == Token.FUNCTION) { int function = FunctionExpressionNode.stringToFunction(lookahead.sequence); nextToken(); ExpressionNode expr = argument(); return new FunctionExpressionNode(function, expr); } // argument -> OPEN_BRACKET sum CLOSE_BRACKET else if (lookahead.token == Token.OPEN_BRACKET) { nextToken(); ExpressionNode expr = expression(); if (lookahead.token != Token.CLOSE_BRACKET) throw new ParserException("Closing brackets expected", lookahead); nextToken(); return expr; } // argument -> value return value(); }

The argument non-terminal does not actually correspond to any type of node in the expression tree. All we need to do here is pass on the nodes we obtain from the sub-expressions. So, depending on the rule, we either return a terminal node via the **value()** method or a more complex sub-tree via the **expression()** method or we create a **FunctionExpressionNode** if we encounter a function.

If we find a function then we have to create a **FunctionExpressionNode**. To do this we first have to convert the string in** lookahead.sequence** to a function identifier. In the last post we defined a static helper method called **stringToFunction()** that can do this for us. This function will throw an exception if the string does not correspond to a supported function.

The factor() method produces argument non-terminal optionally wrapped by **functionOp**.

private ExpressionNode factor() { // factor -> argument factor_op ExpressionNode a = argument(); return factorOp(a); }

The rule for **factor** produces and **argument** and a **factor_op** non-terminal. The **factor_op** non-terminal can include an exponentiation, or it might not. In order for **factor_op** to create the exponentiation it needs to know the base which is parsed by **argument()**. For this reason we decide to pass the node produced by **argument()** to the **factorOp()** method.

private ExpressionNode factorOp(ExpressionNode expression) { // factor_op -> RAISED factor if (lookahead.token == Token.RAISED) { nextToken(); ExpressionNode exponent = signedFactor(); return new ExponentiationExpressionNode(expression, exponent); } // factor_op -> EPSILON return expression; }

If we find a **RAISED** token then **factorOp()** creates an **ExponentiationExpressionNode** with the base given by the expression passed to the the method and the exponent which is parsed from a recursive call to **factor()**. If the **EPSILON** rule is applied then we simply return expression.

We have called the **signedFactor()** method for the exponent which corresponds to the **signed_factor** non-terminal.

private ExpressionNode signedFactor() { // signed_factor -> PLUSMINUS factor if (lookahead.token == Token.PLUSMINUS) { boolean positive = lookahead.sequence.equals("+"); nextToken(); ExpressionNode t = factor(); if (positive) return t; else return new AdditionExpressionNode(t, false); } // signed_factor -> factor return factor(); }

A signed factor can start with a plus or minus sign. In the case of a minus sign, an **AdditionExpressionNode** is generated to represent the negative of the factor. In all other cases the expression from **factor()** is simply passed on.

The term non-terminal again does not correspond to any node in the expression tree. The implementation of **term()**, therefore, just passes on the expression nodes.

private ExpressionNode term() { // term -> factor term_op ExpressionNode f = factor(); return termOp(f); }

Again, **termOp()** might or might not produce a product, depending on which symbols are found next. In order for **termOp()** to be able to create a product including the first factor, we pass the factor to **termOp()**.

private ExpressionNode termOp(ExpressionNode expression) { // term_op -> MULTDIV factor term_op if (lookahead.token == Token.MULTDIV) { MultiplicationExpressionNode prod; if (expression.getType() == ExpressionNode.MULTIPLICATION_NODE) prod = (MultiplicationExpressionNode)expression; else prod = new MultiplicationExpressionNode(expression, true); boolean positive = lookahead.sequence.equals("*"); nextToken(); ExpressionNode f = signedFactor(); prod.add(f, positive); return termOp(prod); } // term_op -> EPSILON return expression; }

The implementation of **termOp()** is slightly more complicated. Remember from the last post that we can add an arbitrary number of factors to a **MultiplicationExpressionNode**. This means, as we continue calling **termOp()** recursively we want to keep adding factors to the multiplication. When we discover a **MULTDIV** token we know that we are dealing with a product or division. We create a new **MultiplicationExpressionNode** only if the node that was passed to us is not already a **MultiplicationExpressionNode**. Then we add the next factor to the multiplication.

The **EPSILON** rule, on the other hand, simply returns the expression that was passed to **termOp()**.

A generalisation of **term** is **signed_term**. This can include a leading plus or minus symbol.

private ExpressionNode signedTerm() { // signed_term -> PLUSMINUS term if (lookahead.token == Token.PLUSMINUS) { boolean positive = lookahead.sequence.equals("+"); nextToken(); ExpressionNode t = term(); if (positive) return t; else return new AdditionExpressionNode(t, false); } // signed_term -> term return term(); }

Only in the case that the leading symbol is a minus, we have to create an **AdditionExpressionNode** and add the term with a negative sign. In all other cases we simply pass on the node obtained from **term()**.

The methods **expression()** and **sumOp()** follow pretty much the same pattern as **term()** and **termOp()**.

private ExpressionNode expression() { // expression -> signed_term sum_op ExpressionNode expr = signedTerm(); return sumOp(expr); }

The method **expression()** does not produce any nodes itself but passes the result of **signedTerm()** to **sumOp()** which can then assemble and **AdditionExpressionNode** if it has to.

private ExpressionNode sumOp(ExpressionNode expr) { // sum_op -> PLUSMINUS term sum_op if (lookahead.token == Token.PLUSMINUS) { AdditionExpressionNode sum; if (expr.getType() == ExpressionNode.ADDITION_NODE) sum = (AdditionExpressionNode)expr; else sum = new AdditionExpressionNode(expr, true); boolean positive = lookahead.sequence.equals("+"); nextToken(); ExpressionNode t = term(); sum.add(t, positive); return sumOp(sum); } // sum_op -> EPSILON return expr; }

As with **termOp()** we continue calling **sumOp()** recursively we want to keep adding terms to the sum. When we discover a **PLUSMINUS** token we know that we are dealing with an addition or subtraction. We create a new **AdditionExpressionNode** only if the node that was passed to us is not already a **AdditionExpressionNode**. Then we add the next term to the sum.

The **EPSILON** rule, on the other hand, simply returns the expression that was passed to **sumOp()**.

Finally we modify the parse() method to return the ExpressionNode created in the process of parsing the input.

public ExpressionNode parse(LinkedList<Token> tokens) { this.tokens = (LinkedList<Token>)tokens.clone(); lookahead = this.tokens.getFirst(); ExpressionNode expr = expression(); if (lookahead.token != Token.EPSILON) throw new ParserException("Unexpected symbol %s found", lookahead); return expr; }

With all the above we are ready to use the parser. In our main() method we can now test the following.

Parser parser = new Parser(); try { ExpressionNode expression = parser.parse("3*2^4 + sqrt(1+3)"); System.out.println("The value of the expression is "+expression.getValue()); } catch (ParserException e) { System.out.println(e.getMessage()); } catch (EvaluationException e) { System.out.println(e.getMessage()); }

This will produce the following output.

The value of the expression is 50.0

I hope you feel the same of exhilaration when seeing this result. It just feels like everything finally came together and the parser works exactly as expected.

Unfortunately we are not quite done. While we are able to parse variables and named constants, we have not yet got a mechanism for setting the values of those constants. This will be the topic of the next and final post in this series. To make thins a bit more interesting, I will introduce the visitor design pattern to achieve this.

]]>The last post in this series showed you how a mathematical expression parser was implemented from the grammar that we designed earlier. This was all fine and good but the parser did not do much except confirm if a mathematical expression was valid or not. In the end there was no way of evaluating that expression because we never stored any of the information that the parser was able to retrieve from the input. We will have to do this by adding side effects in the parser’s methods, but before we do we have to decide how our mathematical expressions should be stored. That is the topic of this post.

The typical way of storing expressions is in a tree structure. Each node in the tree represents either an operation or a value. The outermost operation (the one that is performed last in the evaluation of the expression) is at the root of the tree and the values are stored in the leaf node. The figure on the right shows the tree for the expression **3*2^4 + sqrt(1+3)**. The outermost operation is the **+** which adds **3*2^4** and **sqrt(1+3)**. Evaluating the expression is done from the leaves upward. In our example we can start on the left side and evaluate 2^4 and replace the exponentiation with the result, 16. The resulting tree is shown below.

We continue in the same way with the **3*16** nodes and replace the multiplication with the result **48**. On the right hand side of the tree we can evaluate the addition **1+3** and replace it with the sum **4**. Then we evaluate **sqrt(4)** which results **2**. The sqrt node is replaced and the tree, after all these operations is shown in the figure below. The final result of the expression is obtained from adding the two numbers which then gives **50** as the answer.

To store the tree in some kind of data structure we need to cast it into some sort of object hierarchy. We will create a base interface for the nodes of the tree and a number of classes that implement the interface for different types of nodes, that is one class for additions, one class for multiplications and so on. The interface is called **ExpressionNode**.

public interface ExpressionNode { public static final int VARIABLE_NODE = 1; public static final int CONSTANT_NODE = 2; public static final int ADDITION_NODE = 3; public static final int MULTIPLICATION_NODE = 4; public static final int EXPONENTIATION_NODE = 5; public static final int FUNCTION_NODE = 6; public int getType(); public double getValue(); }

ExpressionNode defines a couple of static integer constants that can be used to identify different node types. The abstract method getType() is intended to return the type of the node, that is every derived class should return one of the above constants. The method getValue() should return the evaluated result of the sub-tree which is rooted in the current node. In the example above, the node corresponding to the square root should return 2.

We now implement this interface and create a class called ConstantExpressionNode for holding numeric constants, that is numbers like 2 or 3.14159.

public class ConstantExpressionNode implements ExpressionNode { private double value; public ConstantExpressionNode(double value) { this.value = value; } public ConstantExpressionNode(String value) { this.value = Double.valueOf(value); } public double getValue() { return value; } public int getType() { return ExpressionNode.CONSTANT_NODE; } }

The method **getType()** returns **ExpressionNode.CONSTANT_NODE** and **getValue()** returns the value stored in a private member of type double. There are two constructors, one will take a double value of the constant and the other will take a which is converted to the double value.

The expression node that holds variables or named constant is called VariableExpressionNode.

public class VariableExpressionNode implements ExpressionNode { private String name; private double value; private boolean valueSet; public VariableExpressionNode(String name) { this.name = name; valueSet = false; } public int getType() { return ExpressionNode.VARIABLE_NODE; } public void setValue(double value) { this.value = value; this.valueSet = true; } public double getValue() { if (valueSet) return value; else throw new EvaluationException("Variable '" + name + "' was not initialized."); } }

**VariableExpressionNode** has an internal parameter, **name**, that stores the name of the variable along with a double, **value**, that stores the value of the variable if it is set. A boolean, **valueSet**, indicates if the value has been set. The **getValue()** method will return the value or throw an exception, depending on the state of **valueSet**.

Next we want to create a class that captures additions. Instead of holding just two terms which are added, we decide that an addition should hold an arbitrary number of terms which are either added or subtracted. We will have a similar requirement for multiplications and we will separate out the commonalities in a class called **SequenceExpressionNode**. First we define an inner class called **Term** which holds a single term of the sum together with a boolean indicating the sign.

public class Term { public boolean positive; public ExpressionNode expression; public Term(boolean positive, ExpressionNode expression) { super(); this.positive = positive; this.expression = expression; } }

This class is used by the **SequenceExpressionNode**. Note how the class is declared abstract because we are not yet implementing the methods of the **ExpressionNode** interface.

public abstract class SequenceExpressionNode implements ExpressionNode { protected LinkedList<Term> terms; public SequenceExpressionNode() { this.terms = new LinkedList<Term>(); } public SequenceExpressionNode(ExpressionNode a, boolean positive) { this.terms = new LinkedList<Term>(); this.terms.add(new Term(positive, a)); } public void add(ExpressionNode a, boolean positive) { this.terms.add(new Term(positive, a)); } }

The linked list **terms** contains all the terms of the sum or product. Two constructors are available, one creates an empty sequence, the other immediately adds the first term into the sequence. The second constructor is a there for convenience so that sequences can be created easily. The add() method allows terms to be added to the sum or product. With this class defined, we can now go ahead and write the **AdditionExpressionNode** class.

public class AdditionExpressionNode extends SequenceExpressionNode { public AdditionExpressionNode() { super(); } public AdditionExpressionNode(ExpressionNode a, boolean positive) { super(a, positive); } public int getType() { return ExpressionNode.ADDITION_NODE; } public double getValue() { double sum = 0.0; for (Term t : terms) { if (t.positive) sum += t.expression.getValue(); else sum -= t.expression.getValue(); } return sum; } }

The constructors and the **getType()** method are implemented as expected. **getValue()** iterates over the terms and adds or subtracts them from the sum, depending of the **positive** flag inside the terms. In order to get the values of the sub-expressions that are added and subtracted, we can recursively call the **getValue()** methods of all the terms.

The **MultiplicationExpressionNode** is very similar.

public class MultiplicationExpressionNode extends SequenceExpressionNode { public MultiplicationExpressionNode() { super(); } public MultiplicationExpressionNode(ExpressionNode a, boolean positive) { super(a, positive); } public int getType() { return ExpressionNode.MULTIPLICATION_NODE; } public double getValue() { double prod = 1.0; for (Term t : terms) { if (t.positive) prod *= t.expression.getValue(); else prod /= t.expression.getValue(); } return prod; } }

The only new thing here is the **getValue()** method. Instead of adding or subtracting terms from a sum that is initially zero we now multiply or divide from a product. The product has to be initialised to 1.0 instead of 0.0 like the sum in the **AdditionExpressionNode**. In contrast to the previous two operations, exponentiation only takes two arguments, the base and the exponent. Accordingly, the ExponentiationExpressionNode looks slightly simpler.

public class ExponentiationExpressionNode implements ExpressionNode { private ExpressionNode base; private ExpressionNode exponent; public ExponentiationExpressionNode(ExpressionNode base, ExpressionNode exponent) { this.base = base; this.exponent = exponent; } public int getType() { return ExpressionNode.EXPONENTIATION_NODE; } public double getValue() { return Math.pow(base.getValue(), exponent.getValue()); } }

The constructor assigns the private base and exponent expressions and **getValue()** uses Math.pow() to calculate the result of the exponentiation. Again, because the base and the exponent are both expression nodes themselves, we call their **getValue()** methods to obtain a value.

The last type of expression node that needs implementing will handle mathematical functions. This is the only class which will be slightly more laborious because we have to use control statements to tell the different functions apart.

public class FunctionExpressionNode implements ExpressionNode { public static final int SIN = 1; public static final int COS = 2; public static final int TAN = 3; public static final int ASIN = 4; public static final int ACOS = 5; public static final int ATAN = 6; public static final int SQRT = 7; public static final int EXP = 8; public static final int LN = 9; public static final int LOG = 10; public static final int LOG2 = 11; private int function; private ExpressionNode argument; public FunctionExpressionNode(int function, ExpressionNode argument) { super(); this.function = function; this.argument = argument; } public int getType() { return ExpressionNode.FUNCTION_NODE; } public double getValue() { switch (function) { case SIN: return Math.sin(argument.getValue()); case COS: return Math.cos(argument.getValue()); case TAN: return Math.tan(argument.getValue()); case ASIN: return Math.asin(argument.getValue()); case ACOS: return Math.acos(argument.getValue()); case ATAN: return Math.atan(argument.getValue()); case SQRT: return Math.sqrt(argument.getValue()); case EXP: return Math.exp(argument.getValue()); case LN: return Math.log(argument.getValue()); case LOG: return Math.log(argument.getValue()) * 0.43429448190325182765; case LOG2: return Math.log(argument.getValue()) * 1.442695040888963407360; } throw new EvaluationException("Invalid function id "+function+"!"); } public static int stringToFunction(String str) { if (str.equals("sin")) return FunctionExpressionNode.SIN; if (str.equals("cos")) return FunctionExpressionNode.COS; if (str.equals("tan")) return FunctionExpressionNode.TAN; if (str.equals("asin")) return FunctionExpressionNode.ASIN; if (str.equals("acos")) return FunctionExpressionNode.ACOS; if (str.equals("atan")) return FunctionExpressionNode.ATAN; if (str.equals("sqrt")) return FunctionExpressionNode.SQRT; if (str.equals("exp")) return FunctionExpressionNode.EXP; if (str.equals("ln")) return FunctionExpressionNode.LN; if (str.equals("log"))return FunctionExpressionNode.LOG; if (str.equals("log2")) return FunctionExpressionNode.LOG2; throw ParseException("Unexpected Function "+str+" found!"); } public static String getAllFunctions() { return "sin|cos|tan|asin|acos|atan|sqrt|exp|ln|log|log2"; } }

We first define a series of integer constants for the different mathematical functions available. The private member **function** will be set to one of these pre-specified values. The expression node **argument** holds the argument of the function. The method **getValue()** decides, in a switch statement, which function to apply to the argument.

Two static methods are provided to help tokenizing and parsing an expression from a String. The method **stringToFunction()** will take a string of a function name and return the corresponding integer function identifier. If the method is passed a string that is not a recognised function then an exception is thrown. This method will be used by the parser to create new **FunctionExpressionNode** objects. The method **getAllFunctions()** returns a string containing all supported functions, separated by a |. This string may be used by the tokenizer to create a regular expression that identifies functions inside a string.

These are all the classes needed to represent mathematical expressions. But for now we have not created the code necessary to create the expression tree when parsing an expression from a string. This means that, for now, if we want to test our code we will have to create the tree by hand. The code for creating the expression **3*2^4 + sqrt(1+3)** is given below.

AdditionExpressionNode innerSum = new AdditionExpressionNode(); innerSum.add(new ConstantExpressionNode(1), true); innerSum.add(new ConstantExpressionNode(3), true); ExpressionNode sqrt = new FunctionExpressionNode( FunctionExpressionNode.SQRT, innerSum); ExpressionNode expo = new ExponentiationExpressionNode( new ConstantExpressionNode(2), new ConstantExpressionNode(4)); MultiplicationExpressionNode prod = new MultiplicationExpressionNode(); prod.add(new ConstantExpressionNode(3), true); prod.add(expo, true); AdditionExpressionNode expression = new AdditionExpressionNode(); expression.add(prod, true); expression.add(sqrt, true); System.out.println("The result is " + expression.getValue());

If we have done everything right then this code will print out

The result is 50.0

In the next contribution to this series I will modify the parser to generate the expression tree while parsing the input.

]]>We first add to our class **Token** that holds the information about the tokens in the input.

public class Token { public static final int EPSILON = 0; public static final int PLUSMINUS = 1; public static final int MULTDIV = 2; public static final int RAISED = 3; public static final int FUNCTION = 4; public static final int OPEN_BRACKET = 5; public static final int CLOSE_BRACKET = 6; public static final int NUMBER = 7; public static final int VARIABLE = 8; public final int token; public final String sequence; public Token(int token, String sequence) { super(); this.token = token; this.sequence = sequence; } }

In short, this class defines a number of static constants for the different types of tokens and a couple of fields that hold the data for the individual token. Note how we made the fields public final. This makes it immediately clear that we are dealing with an immutable object and there is no need for a multitude of getters in the class.

Next we write class called **Parser** which does the actual parsing of the expression. The tokens are stored in a **List** of **Token** objects and one **Token** object is stored as the lookahead.

The parser we are about to write will not do much except throwing an excpetion if the expression is invalid. If the parser runs without an exception being thrown we know that the expression is valid. In one of the future posts we will add to this parser and construct an internal representation of the expression that can be used for calculations.

public class Parser { LinkedList<Token> tokens; Token lookahead;

The main method of the parser is called **parse** and takes the tokens as parameter.

public void parse(List<Token> tokens) { this.tokens = (LinkedList<Token>) tokens.clone(); lookahead = this.tokens.getFirst(); expression(); if (lookahead.token != Token.EPSILON) throw new ParserException("Unexpected symbol %s found", lookahead); }

In the **parse** method we first create a shallow copy of the token list because we will be taking elements out of the list and we don’t want to create side effects on the parameters. Then **lookahead** is assigned the first token in the list. After these initialisations we call a method called **expression()**. The parser will have one method for every non-terminal symbol of the grammar that we designed in the last post. This means the method **expression()** will parse the non-terminal symbol **expression**.

Once the expression has been parsed completely there should be no symbols left in the list. This means that the **lookahead** should be equal to **Token.EPSILON**. If there is still a symbol left in the lookahead it means that there is an error in the input. After parsing the expression we can therefore perform an error check. This takes care of balancing parentheses and making sure the input is a valid expression.

Before we write the **expression()** method we create a small utility method that reads the next token from the list.

private void nextToken() { tokens.pop(); // at the end of input we return an epsilon token if (tokens.isEmpty()) lookahead = new Token(Token.EPSILON, "", -1); else lookahead = tokens.getFirst(); }

This method will be used frequently for reducing the matched terminal symbols. We pop the first token off the list and set the lookahead to the new head of the list. In case the list is empty, we create the special **EPSILON** symbol which shows to the parser that the input has finished.

Now for the **expression()** method. The **expression** non-terminal only has one rule so the method becomes quite simple.

private void expression() { // expression -> signed_term sum_op signedTerm(); sumOp(); }

For each non-terminal on the right hand side of the rule we call the appropriate method.

We continue with writing a method for the **sum_op** non-terminal.

private void sumOp() { if (lookahead.token == Token.PLUSMINUS) { // sum_op -> PLUSMINUS term sum_op nextToken(); term(); sumOp(); } else { // sum_op -> EPSILON } }

If the next symbol is **PLUSMINUS**, that is a plus or a minus, we apply the rule

sum_op -> PLUSMINUS term sum_op

Because the right hand side starts with a terminal that is allowed by the grammar, we can eat it up by calling **nextToken()**. Then we call the methods corresponding to the remaining non-terminals in the rule. If, on the other hand the next token is not **PLUSMINUS** there is no other match for **sum_op** except the **EPSILON** rule. This means that, in this case, we just do nothing and return out of the **sumOp()** method.

We can continue in the same way with **signedTerm()**

private void signedTerm() { if (lookahead.token == Token.PLUSMINUS) { // signed_term -> PLUSMINUS term nextToken(); term(); } else { // signed_term -> term term(); } }

Again, there are two possible rules. If the next token is **PLUSMINUS** we can eat it up and then parse the non-terminal **term**. Otherwise we parse the non-terminal **term** directly.

The next symbols are handled in pretty much the same way.

private void term() { // term -> factor term_op factor(); termOp(); } private void termOp() { if (lookahead.token == Token.MULTDIV) { // term_op -> MULTDIV factor term_op nextToken(); signedFactor(); termOp(); } else { // term_op -> EPSILON } }

The previous two methods will handle a multiplication of an arbitrary number of terms. The individual factors, except the first one, can be preceded by a **PLUSMINUS**. This is handled by the following method.

private void signedFactor() { if (lookahead.token == Token.PLUSMINUS) { // signed_factor -> PLUSMINUS factor nextToken(); factor(); } else { // signed_factor -> factor factor(); } }

The following two methods will handle factors which can contain exponentiation of an arbitrary number of terms.

private void factor() { // factor -> argument factor_op argument(); factorOp(); } private void factorOp() { if (lookahead.token == Token.RAISED) { // factor_op -> RAISED expression nextToken(); signedFactor(); } else { // factor_op -> EPSILON } }

The following method, **argument()**, handles either a fixed value as given by a variable or constant, a function, or a bracketed expression.

private void argument() { if (lookahead.token == Token.FUNCTION) { // argument -> FUNCTION argument nextToken(); argument(); } else if (lookahead.token == Token.OPEN_BRACKET) { // argument -> OPEN_BRACKET sum CLOSE_BRACKET nextToken(); expression(); if (lookahead.token != Token.CLOSE_BRACKET) throw new ParserException("Closing brackets expected and " + lookahead.sequence + " found instead"); nextToken(); } else { // argument -> value value(); } }

When we are parsing a bracketed expression, we first eat up the opening bracket by calling **nextToken()**. Then we call **expression()** to parse the expression inside the brackets. After that is done we are expecting the next token to be a closing bracket. If that is not the case than we have encountered a syntax error. In this case we throw an exception informing about the error.

Finally we have a method for the non-terminal value which can either expand to a **NUMBER** or to a **VARIABLE**.

private void value() { if (lookahead.token == Token.NUMBER) { // argument -> NUMBER nextToken(); } else if (lookahead.token == Token.VARIABLE) { // argument -> VARIABLE nextToken(); } else { throw new ParserException( "Unexpected symbol "+lookahead.sequence+" found"); } }

This concludes the coding of a recursive descent parser. If you haven’t figured it out by now, the name “recursive descent” comes from the fact that the parser performs a depth first search by recursively calling the same methods. You can spot the recursion in the **argument()** method which is indirectly called by the **expression()** method but also calls the **expression()** method itself.

In the next post in this series we will be constructing an internal representation of the expression.

]]>