C# Regex to get the comments block out of pl/sql code

c++parsingplsqlregex

i want to extract the comments out of a string as a block.

e.g. I have a PL/SQL code as:

--comment1
select * from t_table;
--i want comment 2;
/*i want comment
3 */
--i want comment 4
OPEN data_cur;

Here, i want all the single line and multiline comments before OPEN data_cur; but after
select * from t_table;

i.e. i want a full comment block between two PL/SQL statements.

i want the regex to match the string and form the groups as:

  1. SQL statement
  2. whole comment block

Can it be done using any regex?

Best Solution

(?ms)^\s*([^\-/\s][^;]+;)((?:(?:\s+/\*(?:.(?!\*/))+.\*/)|(?:\s+\-\-[^\r\n]+))*)

should do it nicely.
(Note: (?ms) is the way in Java to specify a "multi-line" "dot-all" mode, but I have tested in on RETester and it works)

It can detect 0 or more comments after the select.
Multi line comments are accepted between /* and */ because of the use of a positive lookahead (?:.(?!\*/))+ in a non-capturing group

Update: modified it to detect any "non-comment" line before 0 or many comments.


To detect comment1, then statement, then comments:

(?ms)((?:(?:\s+/\*(?:.(?!\*/))+.\*/)|(?:\s+\-\-[^\r\n]+))*)\s*^\s*([^\-/\s][^;]+;)((?:(?:\s+/\*(?:.(?!\*/))+.\*/)|(?:\s+\-\-[^\r\n]+))*)

Note: I have update in both regexps (the one in the beginning and this one) the statement detection part:

\s*([^\-/][^;]+;)

to

\s*([^\-/\s][^;]+;)

Note bis: only the first "comments - statement - comments" will be detected.
The next blocks will also be detected, but only as : "statement - comments", then "statement - comments", and so on. (the comments above a statement are part of the comments below the previous statement)