C++ Multi-Line Comments Using Backslash

C++ Multi-line comments using backslash

Yes. Lines terminated by a \ are spliced together with the next line very early in the process of translation. It happens at phase 2 of translation, before comment removal and before preprocessor has a chance to do its work.

Comment recognition and removal takes place at phase 3. For this reason you can turn a // comment into what looks like a multi-line comment by using the \. This usually fools most syntax-highlighting source code parsers.

Preprocessor works at phase 4.

This all means that you can "multiline" virtually anything using the \, including comments and preprocessor directives

#\
d\
e\
f\
i\
n\
e \
ABC \
int i

int main() {
A\
B\
C = 5;
}

P.S. Please note that the terminating \ does not introduce any whitespace into the spliced line. This should be taking onto account when writing multi-line comments using the \ feature. For example, the following comment

// to\
get\
her

stands for the single word "together" and not for three separate words "to get her". Obviously, incorrect use of \ in comments might drastically obfuscate and even distort their intended meaning.

In C, is the backslash character (\) required when one needs to split arguments of functions or logical conditions into multiple lines?

The slashes are optional except for defining multi line macros as Gerhardh mentioned.

#include <stdio.h>

//valid macro
#define BAR(x, y, z) printf("%d %c %f\n", \
x, \
y, \
z);

//invalid macro, this won't compile if uncommented
/*
#define BAR(x, y, z) printf("%d %c %f\n",
x,
y,
z);
*/

void foo(int x, char y, float z) {
printf("%d %c %f\n", x, y, z);
}

int main() {
//valid
foo(5, \
'c', \
0.0f);

//also valid
foo(5,
'c',
0.0f);

//even use of the macro without slashes is valid
BAR(5,
'c',
0.0f);

return 0;
}

C++ single line comments followed by \ transforms in multiline comment

C++ standard, 2.2 - phases of translation. Phase 2 includes

Each instance of a backslash character (\) immediately followed by a new-line character is deleted,
splicing physical source lines to form logical source lines.

and Phase 3 includes

Each comment is replaced by one space character

So the backslash at the end of the line is recognised before comments.

Equivalent phases 2 and 3 for C can be found in C standard (5.1.1.2 Translation phases in my draft).

gcc multi line comment warning

During the initial processing preprocessor performs a series of textual transformations on its input.

Here's the quote from the docs (relevant piece is in bold):

Continued lines are merged into one long line.

A continued line is a line which ends with a backslash, . The backslash is removed and the following line is joined with the current one.

...

The trailing backslash on a continued line is commonly referred to as a backslash-newline.

If there is white space between a backslash and the end of a line, that is still a continued line. However, as this is usually the result of an editing mistake, and many compilers will not accept it as a continued line, GCC will warn you about it.

In this case it is best to use '\' instead of \ as backslash is used as a symbol and not as a continued-line indicator. Another (subjectively inferior) option is to put an ending non-whitespace character after \ (for example a dot).

Best solution of matching C-style multiple line comments in flex?

None of the patterns given is actually correct for C or C++, because they don't take into consideration line splicing or trigraphs. (You might consider trigraphs unnecessary these days, and I wouldn't disagree, but even though they are now deprecated, you might still need to process legacy files which used them.)

(This might not be a consideration for a language which is neither C nor C++, but which has similar multiline comments. In that case, it's a toss-up between the monolithic regular expression and the start condition, but I would choose the start condition to avoid slow-down from very long comments.)

While you can write a monolithic regex which includes splices, you'll find it much easier to write (and read) if you use the start-condition based solution. Of the two extracted from the flex manual, I think (3) is slightly more performant, although in both cases my inclination would be to let flex do the line number counting instead of trying to so it explicitly. Even with %option yylineno matching the comment one line at a time is probably a good idea, since comments can be quite long and flex is optimised for tokens which do not exceed about 8k.

To handle line splices, you would modify it to:

%option yylineno
%x COMMENT
splice (\\[[:blank:]]*\n)*
%%
[/]{splice}[*] BEGIN(COMMENT);

<COMMENT>{
[^*\\\n]+ /* eat anything that's not a '*' or line end */
"*"+[^*/\\\n]* /* eat up '*'s not followed by '/'s or line end */
[*]{splice}[/] BEGIN(INITIAL);
[*\\] /* stray '*' or backslash */
\n /* Reduce the amount of work needed for yylineno */
}

If you want to handle trigraphs, you'll need to expand the definition of splice and add some more rules to <COMMENT> for ?.

A line splice is a backslash at the end of a line, indicating that the next line is a continuation. The backslash and the newline are removed from the input text, so that the last character of the continued line is followed immediately by the first character of the continuation line. Thus, the following is a valid comment:

/\
************** START HERE **************\
/

Gcc and clang (and quite possibly other compilers) allow the backslash character to be followed by whitespace, since otherwise the difference between a valid continuation and a stray backslash is not visible.

Continuation lines are handled before almost any other processing, so that they can be place inside string literals, comments, or any token. They're mostly used in #define preprocessor directives to comply with the requirement that a preprocessor directive is a single input line. But someone intent on obfuscating C code could use them more liberally. They can, for example, be used to extend C++-style single line comments over multiple physical lines:

// This is a comment...\
which extends over...\
three lines.

The only processing which happens before line continuations is trigraph processing. You can search for trigraphs on Wikipedia (or elsewhere); I'll limit myself to noting that the backslash is one of the characters which has a trigraph equivalent, ??/. Since trigraphs are processed before continuation lines, the first example of a spliced multiline comment could have been written:

/??/
************** START HERE **************\
/

Some compilers do not handle trigraphs by default; they may issue a warning if a trigraph is seen. If you want to try the above with gcc, for example, you'll need to either specify an ISO C standard (eg. -std=c11) or provide the -trigraphs command-line flag.

Using \ to extend single-line comments

It's part of C. Called line splicing.

The K&R book talks about it

Lines that end with the backslash character \ are folded by deleting the backslash and the
following newline character. This occurs before division into tokens.

This occurs in the preprocessing phase.

So single line comments can be made to appear like multi line like

//This is \
still a single line comment

Likewise with the case of strings

char str[]="Hello \
world. This is \
a string";

Edit: As noted in the comments, single line comments were not there in ANSI C but were introduced as part of the standard in C99 though many compilers already supported it.

From C99,

Except within a character constant, a string literal, or a comment, the characters // introduce a comment that includes all multibyte characters up to, but not including, the next new-line character. The contents of such a comment are examined only to identify multibyte characters and to find the terminating new-line character.

As far as line splicing is concerned, it is specified in C89 itself

2.1.1.2 Translation phases


  1. Each instance of a new-line character and an immediately preceding backslash character is deleted, splicing physical source lines to form logical source lines. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character.

Look at KamiKaze's answer to see the relevant part of C99.

How to escape backslash in // comment

The foundation of the issue is the definition of a line continuation.

When a line ends with a backslash-newline combination or <backslash><whitespace><newline> combination, the compiler appends the next line of text to the present line of text. This can be demonstrated with macros:

#define ME {\
cout << "me\n" \
}

The above will be treated as the single line:

#define ME {cout << "me\n"}

The compiler is complaining because your "//" comment extends to the next line because the '\' continuation character.

Solution:
Put other characters after the '\'.

Examples:

  '\'
\ ending character

Multi-line wide string constant that allows single line comments in C++

Simply drop the backslashes:

    static constexpr WCHAR TEST[] = L"The quick brown fox" // a comment
L"jumped over the" // another comment
L"lazy dog!"; // yet another comment

Comments are replaced by space characters in translation phase 3. Adjacent string literals are concatenated in translation phase 6.



Related Topics



Leave a reply



Submit