What are some good tools for getting a quick start for parsing and analyzing C/C++ code?
In particular, I'm looking for open source tools that handle the C/C++ preprocessor and language. Preferably, these tools would use lex/yacc (or flex/bison) for the grammar, and not be too complicated. They should handle the latest ANSI C/C++ definitions.
Here's what I've found so far, but haven't looked at them in detail (thoughts?):
- CScope – Old-school C analyzer. Doesn't seem to do a full parse, though. Described as a glorified 'grep' for finding C functions.
- GCC – Everybody's favorite open source compiler. Very complicated, but seems to do it all. There's a related project for creating GCC extensions called GEM, but hasn't been updated since GCC 4.1 (2006).
- PUMA – The PUre MAnipulator. (from the page: "The intention of this project is to
provide a library of classes for the analysis and manipulation of C/C++ sources. For this
purpose PUMA provides classes for scanning, parsing and of course manipulating C/C++
sources."). This looks promising, but hasn't been updated since 2001. Apparently PUMA has been incorporated into AspectC++, but even this project hasn't been updated since 2006.
- Various C/C++ raw grammars. You can get c-c++-grammars-1.2.tar.gz, but this has been unmaintained since 1997. A little Google searching pulls up other basic lex/yacc grammars that could serve as a starting place.
- Any others?
I'm hoping to use this as a starting point for translating C/C++ source into a new toy language.
(Added 2/9): Just a clarification: I want to extract semantic information from the preprocessor in addition to the C/C++ code itself. I don't want "#define foo 42" to disappear into the integer "42", but remain attached to the name "foo". This, unfortunately, excludes several solutions that run the preprocessor first and only deliver the C/C++ parse tree)