I'm looking for a good explanation of the definitions of the FIRST, FOLLOW, and PREDICT sets of a RDP when given a grammar.

# Recursive descent parser: how to find the FIRST, FOLLOW, and PREDICT sets

parsingrecursive-descent

#### Related Solutions

LL is usually a more efficient parsing technique than recursive-descent. In fact, a naive recursive-descent parser will actually be *O(k^n)* (where *n* is the input size) in the worst case. Some techniques such as memoization (which yields a Packrat parser) can improve this as well as extend the class of grammars accepted by the parser, but there is always a space tradeoff. LL parsers are (to my knowledge) always linear time.

On the flip side, you are correct in your intuition that recursive-descent parsers can handle a greater class of grammars than LL. Recursive-descent can handle any grammar which is LL(*) (that is, *unlimited* lookahead) as well as a small set of ambiguous grammars. This is because recursive-descent is actually a directly-encoded implementation of PEGs, or Parser Expression Grammar(s). Specifically, the disjunctive operator (`a | b`

) is not commutative, meaning that `a | b`

does not equal `b | a`

. A recursive-descent parser will try each alternative in order. So if `a`

matches the input, it will succede even if `b`

*would have* matched the input. This allows classic "longest match" ambiguities like the dangling `else`

problem to be handled simply by ordering disjunctions correctly.

With all of that said, it is *possible* to implement an LL(k) parser using recursive-descent so that it runs in linear time. This is done by essentially inlining the predict sets so that each parse routine determines the appropriate production for a given input in constant time. Unfortunately, such a technique eliminates an entire class of grammars from being handled. Once we get into predictive parsing, problems like dangling `else`

are no longer solvable with such ease.

As for why LL would be chosen over recursive-descent, it's mainly a question of efficiency and maintainability. Recursive-descent parsers are markedly easier to implement, but they're usually harder to maintain since the grammar they represent does not exist in any declarative form. Most non-trivial parser use-cases employ a parser generator such as ANTLR or Bison. With such tools, it really doesn't matter if the algorithm is directly-encoded recursive-descent or table-driven LL(k).

As a matter of interest, it is also worth looking into recursive-ascent, which is a parsing algorithm directly encoded after the fashion of recursive-descent, but capable of handling any LALR grammar. I would also dig into parser combinators, which are a functional way of composing recursive-descent parsers together.

There is a good tutorial on codeproject under "Compiler Patterns". Lately, you can even just Google "compiler patterns".

http://www.codeproject.com/Articles/286121/Compiler-Patterns

The article covers most aspects of building a simple compiler (the back-end, the BNF, and the patterns used to implement the various BNF rules), but is not very heavy on theory, or even on why a recursive descent compiler works to convert language input into code.

## Best Solution

Try

Programming Language Pragmatics, by Michael L. Scott (Morgan Kaufmann). Parsing is covered in chapter 2. Recursive-descent parsing is described in section 2.2.3;firstandfollowsets in 2.2.5.