Does anybody know which is the most accurate grammar or whether there
is a better approach to this?
Many years ago I spent some time reading RTF (Wikipedia) with C#. I say reading because if you understand RTF in detail and use it the way it was designed you will realize that RTF is not meant to be read as a whole and parsed as a whole over and over again when editing. In the documentation you will find the syntax for RTF, but don't be misled into believing that you should use a lexer/parser. In the documentation they give a sample reader for RTF.
Remember that RTF was created many ages ago when memory was measured in KB and not MB, and editing long documents of several hundred pages in a conventional way would tax system resources. So RFT has the ability to be edited in smaller subsections without loading or modifying the entire document. This is what gives it the ability to work on such large documents with limited memory. It is also why the syntax may seem odd at first.
You have a number of problems here. Most are dealt with in question 4.6 of the JavaCC FAQ. http://www.engr.mun.ca/~theo/JavaCC-FAQ/
First, there is a lot of left-factoring to do. Left factoring tries to move choices to later in the parse. E.g. if you have
void statement() #STM : {}
{
identifier() <ASSIGNMENT> expression()
| identifier() <ASSIGNMENT> <STRING>
| identifier() <LBR> arg_list() <RBR>
}
and the parser is expecting a statement and the next item of input is an identifier, then the parser can't make the choice. Factor out the common parts on the left to get
void statement() #STM : {}
{
identifier()
( <ASSIGNMENT> expression()
| <ASSIGNMENT> <STRING>
| <LBR> arg_list() <RBR>
)
}
and then
void statement() #STM : {}
{
identifier()
( <ASSIGNMENT> ( expression() | <STRING> )
| <LBR> arg_list() <RBR>
)
}
Second, the nonterminal "matched" is useless, as there is no nonrecursive case. I suspect that you are trying to deal with the dangling else problem. This is not a good way to deal with the dangling else problem. See the JavaCC FAQ for a sensible way to deal with it.
Third, there is mutual left recursion between nonterminals "fragment" and "expression". I'm not sure what you are trying to accomplish here. There are several ways to deal with parsing expressions that don't use left recursion. See http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm for more information. My tutorial introduction to JavaCC might also help. http://www.engr.mun.ca/~theo/JavaCC-Tutorial/
Finally a word of advice. Start with a grammar for a small subset of your language and then add constructs one or two at a time. That way you won't have to deal with a lot of problems at once.
Best Solution
Its been a while, but I found this tutorial very helpful on a previous project. I was able to create a query language for our application in a few days with basically no previous experience with javacc.
I've not read it but while looking for the other tutorial I also found this one.