I'm looking for a way to parse / tokenize SQL statement within a Node.js application, in order to:
- Tokenize all the "basics" SQL keywords defined in the ISO/IEC 9075 standard or here.
- Validate the SQL syntax.
- Find out what the query is gonna do (e.g. read or write?).
Do you have any solution or advises peeps?
I've done research and I found out some ways to do it:
Using existing node.js libraries
I did a Google search and I didn't found a consensual and popular library to use. I found those ones:
- simple-sql-parser (22 stars on github, 16 daily download on npm)
- Supports only SELECT, INSERT, UPDATE and DELETE
- There is a v2 branch on the road
- sql-parser (90 stars on github, 6 daily downloads on npm)
- Only supports basic SELECT statements
- Based on jison
- sqljs (17 stars on github, 5 daily downloads on npm)
- v0.0.0-3, under development… No documentation at all 🙂
Unfortunately, none of those libraries seams to be complete and trustful.
Doing it myself based on a node.js low level tokenizer library
I can do it my self with a low level tokenizer library like:
- jison (1,457 stars on github, 240 daily downloads on npm)
- tokenizer (44 stars on github, 10 daily downloads on npm)
I can build a node.js library tokenizer based on CodeMirror.
The SQL mode is here on github, I can maybe adapt it to get tokens within a node application.
I figured out that there are 2 distinct problems: Tokenization and Syntax validation (which is related to tokenization).
I made myself a SQL tokenizer for Node.js based on the SQL mode of the excellent CodeMirror (5,046 stars on github, well maintained). CodeMirror's SQL mode take in charge "generic" SQL and some SQL particularities like MSSQL, MySQL, PL/SQL, Cassandra, Hive and MariaDB.
When my project will be mature enough, I will (probably) put it public on GitHub and let you know.