I'm looking for a good explanation of the definitions of the FIRST, FOLLOW, and PREDICT sets of a RDP when given a grammar.
Recursive descent parser: how to find the FIRST, FOLLOW, and PREDICT sets
LL is usually a more efficient parsing technique than recursive-descent. In fact, a naive recursive-descent parser will actually be O(k^n) (where n is the input size) in the worst case. Some techniques such as memoization (which yields a Packrat parser) can improve this as well as extend the class of grammars accepted by the parser, but there is always a space tradeoff. LL parsers are (to my knowledge) always linear time.
On the flip side, you are correct in your intuition that recursive-descent parsers can handle a greater class of grammars than LL. Recursive-descent can handle any grammar which is LL(*) (that is, unlimited lookahead) as well as a small set of ambiguous grammars. This is because recursive-descent is actually a directly-encoded implementation of PEGs, or Parser Expression Grammar(s). Specifically, the disjunctive operator (
a | b) is not commutative, meaning that
a | b does not equal
b | a. A recursive-descent parser will try each alternative in order. So if
a matches the input, it will succede even if
b would have matched the input. This allows classic "longest match" ambiguities like the dangling
else problem to be handled simply by ordering disjunctions correctly.
With all of that said, it is possible to implement an LL(k) parser using recursive-descent so that it runs in linear time. This is done by essentially inlining the predict sets so that each parse routine determines the appropriate production for a given input in constant time. Unfortunately, such a technique eliminates an entire class of grammars from being handled. Once we get into predictive parsing, problems like dangling
else are no longer solvable with such ease.
As for why LL would be chosen over recursive-descent, it's mainly a question of efficiency and maintainability. Recursive-descent parsers are markedly easier to implement, but they're usually harder to maintain since the grammar they represent does not exist in any declarative form. Most non-trivial parser use-cases employ a parser generator such as ANTLR or Bison. With such tools, it really doesn't matter if the algorithm is directly-encoded recursive-descent or table-driven LL(k).
As a matter of interest, it is also worth looking into recursive-ascent, which is a parsing algorithm directly encoded after the fashion of recursive-descent, but capable of handling any LALR grammar. I would also dig into parser combinators, which are a functional way of composing recursive-descent parsers together.
There is a good tutorial on codeproject under "Compiler Patterns". Lately, you can even just Google "compiler patterns".
The article covers most aspects of building a simple compiler (the back-end, the BNF, and the patterns used to implement the various BNF rules), but is not very heavy on theory, or even on why a recursive descent compiler works to convert language input into code.
Try Programming Language Pragmatics, by Michael L. Scott (Morgan Kaufmann). Parsing is covered in chapter 2. Recursive-descent parsing is described in section 2.2.3; first and follow sets in 2.2.5.