Regular Expression to Detect Semi-Colon Terminated C++ For & While Loops

Regular expression to detect semi-colon terminated C++ for & while loops

You could write a little, very simple routine that does it, without using a regular expression:

  • Set a position counter pos so that is points to just before the opening bracket after your for or while.
  • Set an open brackets counter openBr to 0.
  • Now keep incrementing pos, reading the characters at the respective positions, and increment openBr when you see an opening bracket, and decrement it when you see a closing bracket. That will increment it once at the beginning, for the first opening bracket in "for (", increment and decrement some more for some brackets in between, and set it back to 0 when your for bracket closes.
  • So, stop when openBr is 0 again.

The stopping positon is your closing bracket of for(...). Now you can check if there is a semicolon following or not.

In C/C++ why does the do while(expression); need a semi colon?

Because you're ending the statement. A statement ends either with a block (delimited by curly braces), or with a semicolon. "do this while this" is a single statement, and can't end with a block (because it ends with the "while"), so it needs a semicolon just like any other statement.

why there is semicolon after loop while();

The ; is just a null statement, it is a no op but it it the body of the while loop. From the draft C99 standard section 6.8.3 Expression and null statements:

A null statement (consisting of just a semicolon) performs no operations.

and a while statement is defined as follows from section 6.8.5 Iteration statements:

while ( expression ) statement

So in this case the statement of the while loop is ;.

The main effect of the while loop is here:

string1[i++] == string2[j++]
^^^ ^^^

So each iteration of the loop increments i and j until the whole condition:

string1[i++] == string2[j++] &&string1[i-1] != 0 && string2[j-1] != 0

evaluates to false.

Why do { } while(condition); needs semicolon at the end of it but while(condition) {} doesn't?

You put semicolon after all statements, except the block statement. This is the reason that you place it after the while in do while, but not after the block in the while {...}.

You also use it to terminate almost all declarations. The only exceptions I can think about at the moment is function bodies, and namespace bodies in C++.

semicolon after the for loop block

The semicolon is an empty expression statement.

From section 6.2 of the C++ standard

The expression is a discarded-value expression (Clause 5). All side
effects from an expression statement are completed before the next
statement is executed. An expression statement with the expression
missing is called a null statement.
[ Note: Most statements are
expression statements — usually assignments or function calls. A null
statement is useful to carry a label just before the } of a compound
statement and to supply a null body to an iteration statement such as
a while statement (6.5.1). —end note ]

This will be more clear with some reformatting:

#include <iostream>

int main(){
for(int i=0; i<5; ++i){
std::cout <<"Hello"<<std::endl;
}
;
}

The presence of this null statement has no effect on the program.

What kind of statements don't require semicolon termination in C++?

Yes, it's covered in section 6, "Statement" of the C++ standard (section 6 of C++03, it may have changed in C++11 but I don't have access to that one at the moment).

There are a large number of statement types and not all of them need to be terminated. For example, the following if is a selection statement:

if (i == 1) {
doSomething();
}

and there is no requirement to terminate that with a semi-colon.

Of the different statements covered, the requirements are:

Statement type        Termination required?
============== =====================
labelled statement N (a)
expression Y
compound statements N (a)
selection statements N (a)
iteration statements N (a) (b)
jump statements Y
declaration statement Y

(a) Although it may sometimes appear that these are terminated with a semi-colon, that's not the case. The statement:

if (i == 1) doSomething();

has the semi-colon terminating the inner expression statement, not the compound statement, somthing that should be obvious when you examine the first code segment above that has it inside {} braces.

(b) do requires the semi-colon after the while expression.

RegEx split string with on a delimeter(semi-colon ;) except those that appear inside a string

The regular expression pattern ((?:(?:'[^']*')|[^;])*); should give you what you need. Use a while loop and Matcher.find() to extract all the SQL statements. Something like:

Pattern p = Pattern.compile("((?:(?:'[^']*')|[^;])*);";);
Matcher m = p.matcher(s);
int cnt = 0;
while (m.find()) {
System.out.println(++cnt + ": " + m.group(1));
}

Using the sample SQL you provided, will output:

1: CREATE OR REPLACE PROCEDURE Proc
AS
b NUMBER:=3
2:
c VARCHAR2(2000)
3:
begin
c := 'BEGIN ' || ' :1 := :1 + :2; ' || 'END;'
4:
end Proc

If you want to get the terminating ;, use m.group(0) instead of m.group(1).

For more information on regular expressions, see the Pattern JavaDoc and this great reference. Here's a synopsis of the pattern:

(              Start capturing group
(?: Start non-capturing group
(?: Start non-capturing group
' Match the literal character '
[^'] Match a single character that is not '
* Greedily match the previous atom zero or more times
' Match the literal character '
) End non-capturing group
| Match either the previous or the next atom
[^;] Match a single character that is not ;
) End non-capturing group
* Greedily match the previous atom zero or more times
) End capturing group
; Match the literal character ;


Related Topics



Leave a reply



Submit