Parser generator for grammars with right contexts

User's manual

The very first statement of the grammar description should specify the nonterminal symbol designated as the grammar axiom:

grammar (S);

Then the section of terminals declarations should follow. There are several possible ways of specifying a terminal symbol. One of them is just to declare a name of the terminal symbol; then its literal value will be automatically considered to be identical to its name.

terminal a, b, c;

Another way to declare a terminal is to specify its name and literal value explicitly:

terminal thecomma = ",";

It should be noted that empty is a keyword and is reserved to denote an empty string; therefore, no symbol of the grammar is allowed to have a name empty.

After terminal symbols are declared in one way or another, the section of rules of the grammar should follow.

A rule declaration contains the name of the nonterminal symbol on the left-hand side part, the equality sign (=), and the body of the rule itself.

Conjunction within bodies of the rules is represented by the "ampersand" symbol, while negation (which is experimentally supported by the parser generator) is denoted by a "tilde".

S = A & B & ~ C;

It should be noted that terminal symbols can also be declared "on-the-fly" within the bodies of the rules, by putting a literal value of a terminal in quotes:

VarDecl = "var" ident ";";

Thus, any sequence of literals in quotes is considered as a terminal symbol and is automatically added to the set of terminals of the grammar without any extra declaration. If there happens a collision with a terminal already declared in a terminal statement, then the name of the terminal will be automatically substituted everywhere in the grammar. Context quantifiers are denoted by the sequences of characters >= and >, respectively. Negated quantifiers are denoted by >=~ and >~:

A = >= Y & X & >~ Z;

Within the bodies of the rules, regular expressions like notation can be freely used:

B = (a | b) X (A & B) (>=H & G);

I = {a}+ b [c];

A "stop-scanning" symbol, is represented by a sequence of characters ...:

A = B & > C ...;

Several additional syntactical constructs can be used while describing a grammar. While defining a rule for some nonterminal symbol, which shall essentially be a disjunctive expression, one can introduce "on-the-fly" a new nonterminal "gropuing" those disjuncts. The new example allow to define a nonterminal Program which shall contain one of more Statement being either a AssignmentStmt or a LoopStmt:

Program = { (AssignmentStmt | LoopStmt):=Statement }+;

Another such feature is iteration pair, {E1, E2}, which is essentially an abbreviation for E1 (E2 E1)*:

VarsDecl = "var" {ident, thecomma} ";";

Attributes, both inherited and synthesized, are declared when describing a rule of the grammar, in angle brackets next to the name of the left-hand side of the rule:

Expression < out value:int > = {Term $ System.Console.WriteLine(value); $, "+"};

A declaration of an inherited attribute starts from the keyword in, while a synthesized attributes has a modificator out. The kind of the attribute should be followed by its name and a type. In the applicative use of a nonterminal symbol which has attributes, only either the name of the passed-by-reference variable (in case of a synthesized attribute), or the type matching expression (in case of an inherited attribute) should be specified.

Anywhere within a body of a rule, a user code can be specified by putting it in sequences of characters $. The code contained within these sequences shall be with no alteration pasted into the code produced by the parser generator.

Comments, multiline (/* text */) and one-line (// text), can be used anywhere in the grammar description.


The parser generator can be downloaded here.

Examples of grammars

List of detectable errors in grammars descriptions

The parser generator can produce the following error messages and warnings:
  1. Multiple terminal symbols can not refer to the same terminal '...'
  2. Condition conjunct should be preceded by at least one 'base' conjunct in subexpression '...' in rule for nonterminal '...'
  3. Subexpression '...' in rule for nonterminal '...' has no 'base' conjuncts
  4. Symbol '...' not declared
  5. Nonterminal with name '...' cannot be created -- there is a terminal symbol sharing the same name
  6. Terminal symbol '...' redeclared
  7. Multiple terminal symbols refer to terminal '...'
  8. Grammar does not contain rule for axiom '...'
  9. Symbol '...' redeclared
  10. Quantifier can not be used outside of conjunction expression
  11. Attribute '...' is redeclared
  12. Number of attributes mismatches with the number of declared attributes of nonterminal '...'
  13. Terminal can not have attributes
  14. Axiom '...' of the grammar cannot have attributes
  15. Nonterminal '...' was already declared with other attributes list
  16. On-the-fly-nonterminal '...' cannot be introduced -- there is a nonterminal with the same name; otherwise it would lead to side effects
  17. Terminal value cannot be an empty string; if intentional use 'empty' keyword
  18. PFIRST set for nonterminal '...' is empty -- nonterminal does not produce any terminal string
  19. Usage of multiple rules for the same nonterminal '...' is deprecated
  20. (Warning) Terminal alias '...' should be used to refer to terminal '...'
  21. (Warning) Introduced nonterminal '...' has other rules already -- this reference to it may cause side effects
  22. (Warning) Nonterminal '...' got new rule implicitly
  23. (Warning) Use of on-the-fly-nonterminals is deprecated
  24. (Warning) Subexpression '...' in rule for nonterminal '...' has duplicate base conjuncts
  25. (Warning) Subexpression '...' in rule for nonterminal '...' has duplicate 'aftereq (>=)' conjuncts
  26. (Warning) Subexpression '...' in rule for nonterminal '...' has duplicate 'after (>)' conjuncts