Remove Comments from C/C++ Code

Remove comments from C/C++ code

Run the following command on your source file:

gcc -fpreprocessed -dD -E test.c

Thanks to KennyTM for finding the right flags. Here’s the result for completeness:

test.c:

#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo
/* comments? comments. */
// c++ style comments

gcc -fpreprocessed -dD -E test.c:

#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo

How to remove C-style comments from code

I've considered the comments (so far) and changed the regex to:

(?:\/\/(?:\\\n|[^\n])*\n)|(?:\/\*[\s\S]*?\*\/)|((?:R"([^(\\\s]{0,16})\([^)]*\)\2")|(?:@"[^"]*?")|(?:"(?:\?\?'|\\\\|\\"|\\\n|[^"])*?")|(?:'(?:\\\\|\\'|\\\n|[^'])*?'))

It handles Biffens C++11's raw string literal (as well as C# verbatim strings) and it's changed according to Wiktors suggestions.

Split it to handling single and double quotes separately because of difference in logic (and avoiding the non-working back reference ;).

It's undoubtedly more complex, but still far from the solutions I've seen out there which hardly cover any of the string issues. And it could be stripped of parts not applicable to a specific language.

One comment suggested supporting more languages. That would make the RE (even more) complex and unmanageable. It should be relatively easy to adapt though.

Updated regex101 example.

Thanks everyone for the input so far. And keep the suggestions coming.

Regards

Edit: Update Raw String - this time I actually read the spec. ;)

Is there a reliable tool for removing comments in ASM/C/C++ code?

#!/bin/bash

if [[ "$#" != 1 ]] ; then
echo "Usage: stripcomments input-file" > /dev/stderr
exit
fi

gcc -fpreprocessed -dD -E -P "$1" 2> /dev/null

Remove all comments from a C program - any possible improvements to this code?

Move your stripping of comments into a function (more useful), and read one line at a time with fgets(), last_character is ambiguous (does it mean last, or previous?), this uses far fewer calls to putchar(), only one printf (could use puts) per line, preserves most of what you were doing,

#include <stdio.h>
#include <string.h>
#define MAX_LENGTH 65536

#define NOT_IN_COMMENT 0
#define SINGLE_COMMENT 1
#define MULTI_COMMENT 2
int status = NOT_IN_COMMENT; /* Are we in a comment? What type? */
int in_string = 0; /* Are we inside of a string constant? */
char* stripcomments(char* stripped,char* code)
{
int ndx; /* index for code[] */
int ondx; /* index for output[] */
char prevch; /* Value of the previous character */
char ch; /* Character to input into */

/* Remove all comments from the code and display results to user */
for (ndx=ondx=0; ndx < strlen(code); ndx++)
{
char current = code[ndx];

if (in_string) {
if (current == '"') in_string = 0;
stripped[ondx++] = current;
}
else {
if (status == NOT_IN_COMMENT) {
if (current == '"') {
stripped[ondx++] = current;
in_string = 1;
continue;
}

if (current == '/' && prevch == '/') status = SINGLE_COMMENT;
else if (current == '*' && prevch == '/') status = MULTI_COMMENT;
else if (current != '/' || (current == '/' && ndx < strlen(code)-1 && !(code[ndx+1] == '/' || code[ndx+1] == '*'))) stripped[ondx++] = current;
}

else if (status == SINGLE_COMMENT) {
if (current == '\n') {
status = NOT_IN_COMMENT;
stripped[ondx++] = '\n';
}
}

else if (status == MULTI_COMMENT) {
if (current == '/' && prevch == '*') status = NOT_IN_COMMENT;
}
}
prevch = current;
}
stripped[ondx] = '\0';
return(stripped);
}

int main(void)
{
char code[MAX_LENGTH]; /* Buffer that stores the inputted code */
char stripped[MAX_LENGTH];

while( fgets(code,sizeof(code),stdin) )
{
//printf("%s\n",code);
//strip comments...
stripcomments(stripped,code);
if( strlen(stripped) > 0 ) printf("%s",stripped);
}
}

I'll leave it to you to remove extra blank lines.

Remove C and C++ comments using Python?

I don't know if you're familiar with sed, the UNIX-based (but Windows-available) text parsing program, but I've found a sed script here which will remove C/C++ comments from a file. It's very smart; for example, it will ignore '//' and '/*' if found in a string declaration, etc. From within Python, it can be used using the following code:

import subprocess
from cStringIO import StringIO

input = StringIO(source_code) # source_code is a string with the source code.
output = StringIO()

process = subprocess.Popen(['sed', '/path/to/remccoms3.sed'],
input=input, output=output)
return_code = process.wait()

stripped_code = output.getvalue()

In this program, source_code is the variable holding the C/C++ source code, and eventually stripped_code will hold C/C++ code with the comments removed. Of course, if you have the file on disk, you could have the input and output variables be file handles pointing to those files (input in read-mode, output in write-mode). remccoms3.sed is the file from the above link, and it should be saved in a readable location on disk. sed is also available on Windows, and comes installed by default on most GNU/Linux distros and Mac OS X.

This will probably be better than a pure Python solution; no need to reinvent the wheel.

VSCode: delete all comments in a file

Easy way:

  • Open extensions (ctrl-shift-x)
  • type in remove comments in the search box.
  • Install the top pick and read instructions.

Hard way:

  • search replace(ctrl-h)
  • toggle regex on (alt-r).
  • Learn some regular expressions! https://docs.rs/regex/0.2.5/regex/#syntax

A simple //.* will match all single line comments (and more ;D). #.* could be used to match python comments. And /\*[\s\S\n]*\*/ matches block comments. And you can combine them as well: //.*|/\*[\s\S\n]*\*/ (| in regex means "or", . means any character, * means "0 or more" and indicates how many characters to match, therefore .* means all characters until the end of the line (or until the next matching rule))

Of course with caveats, such as urls (https://...) has double slashes and will match that first rule, and god knows where there are # in code that will match that python-rule. So some reading/adjusting has to be done!

Once you start fiddling with your regexes it can take a lifetime to get them perfect, so be careful and go the easy route if you are short on time, but knowing some simple regex by heart will do you good, since regular expressions are usable almost everywhere.



Related Topics



Leave a reply



Submit