"http://www.w3.org/TR/html4/loose.dtd"> >
In the previous example, the parser computes the value of the expression on the fly, while parsing. It is also possible to build an abstract syntax tree to store an abstract representation of the input. This may be usefull when several passes are necessary.
This example shows how to parse an expression (infix, prefix or postfix) and convert it in infix, prefix and postfix notation. The expression is saved in a tree. Each node of the tree correspond to an operator in the expression. Each leave is a number. Then to write the expression in infix, prefix or postfix notation, we just need to walk throught the tree in a particular order.
The AST of this converter has two types of node:
Both classes are instanciated by the __init__ method. The infix, prefix and postfix methods return strings containing the representation of the node in infix, prefix and postfix notation.
The grammar for infix expressions is similar to the grammar used in the previous example.
EXPR/e -> TERM/e ( '[+-]'/op TERM/t e=Op<op,e,t,1> )* ;
TERM/t -> FACT/t ( '[*/]'/op FACT/f t=Op<op,t,f,2> )* ; FACT/f -> ATOM/f ( '\^'/op FACT/e f=Op<op,f,e,3> )? ; ATOM/a -> ident/s a=Atom<s> | '\(' EXPR/a '\)' ; |
The grammar for prefix expressions is very simple. A compound prefix expression is an operator followed by two subexpressions.
EXPR_PRE/e ->
ident/s e=Atom<s> | '\(' EXPR_PRE/e '\)' | OP/<op,prec> EXPR_PRE/a EXPR_PRE/b e=Op<op,a,b,prec> ; OP/<op,prec> -> '[+-]'/op prec=1 | '[*/]'/op prec=2 | '\^'/op prec=3 ; |
At first sight postfix and infix grammars may be very similar. Only the position of the operators changes. So a compound postfix expression is a first expression followed by a second and an operator. This rule is left recursive. As TPG is a descendant recursive parser, such rules are forbidden to avoid infinite recursion. To remove the left recursion a classical solution is to rewrite the grammar like this:
EXPR_POST/e -> ATOM_POST/a SEXPR_POST<a>/e ;
ATOM_POST/a -> ident/s a=Atom<s> | '\(' EXPR_POST/a '\)' ; SEXPR_POST<e>/e -> EXPR_POST/e2 OP/<op,prec> SEXPR_POST<Op<op,e,e2,prec>>/e | ; |
The parser first searches for an atomic expression and then builds the AST by passing partial expressions by the attributes of the SEXPR_POST symbol.
Here is the complete source code (notation.py):