Remove comments from C/C++ code
Run the following command on your source file:
gcc -fpreprocessed -dD -E test.c
Thanks to KennyTM for finding the right flags. Here’s the result for completeness:
test.c:
#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo
/* comments? comments. */
// c++ style comments
gcc -fpreprocessed -dD -E test.c
:
#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo
How to remove C-style comments from code
I've considered the comments (so far) and changed the regex to:
(?:\/\/(?:\\\n|[^\n])*\n)|(?:\/\*[\s\S]*?\*\/)|((?:R"([^(\\\s]{0,16})\([^)]*\)\2")|(?:@"[^"]*?")|(?:"(?:\?\?'|\\\\|\\"|\\\n|[^"])*?")|(?:'(?:\\\\|\\'|\\\n|[^'])*?'))
It handles Biffens C++11's raw string literal (as well as C# verbatim strings) and it's changed according to Wiktors suggestions.
Split it to handling single and double quotes separately because of difference in logic (and avoiding the non-working back reference ;).
It's undoubtedly more complex, but still far from the solutions I've seen out there which hardly cover any of the string issues. And it could be stripped of parts not applicable to a specific language.
One comment suggested supporting more languages. That would make the RE (even more) complex and unmanageable. It should be relatively easy to adapt though.
Updated regex101 example.
Thanks everyone for the input so far. And keep the suggestions coming.
Regards
Edit: Update Raw String - this time I actually read the spec. ;)
Is there a reliable tool for removing comments in ASM/C/C++ code?
#!/bin/bash
if [[ "$#" != 1 ]] ; then
echo "Usage: stripcomments input-file" > /dev/stderr
exit
fi
gcc -fpreprocessed -dD -E -P "$1" 2> /dev/null
Remove all comments from a C program - any possible improvements to this code?
Move your stripping of comments into a function (more useful), and read one line at a time with fgets(), last_character is ambiguous (does it mean last, or previous?), this uses far fewer calls to putchar(), only one printf (could use puts) per line, preserves most of what you were doing,
#include <stdio.h>
#include <string.h>
#define MAX_LENGTH 65536
#define NOT_IN_COMMENT 0
#define SINGLE_COMMENT 1
#define MULTI_COMMENT 2
int status = NOT_IN_COMMENT; /* Are we in a comment? What type? */
int in_string = 0; /* Are we inside of a string constant? */
char* stripcomments(char* stripped,char* code)
{
int ndx; /* index for code[] */
int ondx; /* index for output[] */
char prevch; /* Value of the previous character */
char ch; /* Character to input into */
/* Remove all comments from the code and display results to user */
for (ndx=ondx=0; ndx < strlen(code); ndx++)
{
char current = code[ndx];
if (in_string) {
if (current == '"') in_string = 0;
stripped[ondx++] = current;
}
else {
if (status == NOT_IN_COMMENT) {
if (current == '"') {
stripped[ondx++] = current;
in_string = 1;
continue;
}
if (current == '/' && prevch == '/') status = SINGLE_COMMENT;
else if (current == '*' && prevch == '/') status = MULTI_COMMENT;
else if (current != '/' || (current == '/' && ndx < strlen(code)-1 && !(code[ndx+1] == '/' || code[ndx+1] == '*'))) stripped[ondx++] = current;
}
else if (status == SINGLE_COMMENT) {
if (current == '\n') {
status = NOT_IN_COMMENT;
stripped[ondx++] = '\n';
}
}
else if (status == MULTI_COMMENT) {
if (current == '/' && prevch == '*') status = NOT_IN_COMMENT;
}
}
prevch = current;
}
stripped[ondx] = '\0';
return(stripped);
}
int main(void)
{
char code[MAX_LENGTH]; /* Buffer that stores the inputted code */
char stripped[MAX_LENGTH];
while( fgets(code,sizeof(code),stdin) )
{
//printf("%s\n",code);
//strip comments...
stripcomments(stripped,code);
if( strlen(stripped) > 0 ) printf("%s",stripped);
}
}
I'll leave it to you to remove extra blank lines.
Remove C and C++ comments using Python?
I don't know if you're familiar with sed
, the UNIX-based (but Windows-available) text parsing program, but I've found a sed script here which will remove C/C++ comments from a file. It's very smart; for example, it will ignore '//' and '/*' if found in a string declaration, etc. From within Python, it can be used using the following code:
import subprocess
from cStringIO import StringIO
input = StringIO(source_code) # source_code is a string with the source code.
output = StringIO()
process = subprocess.Popen(['sed', '/path/to/remccoms3.sed'],
input=input, output=output)
return_code = process.wait()
stripped_code = output.getvalue()
In this program, source_code
is the variable holding the C/C++ source code, and eventually stripped_code
will hold C/C++ code with the comments removed. Of course, if you have the file on disk, you could have the input
and output
variables be file handles pointing to those files (input
in read-mode, output
in write-mode). remccoms3.sed
is the file from the above link, and it should be saved in a readable location on disk. sed
is also available on Windows, and comes installed by default on most GNU/Linux distros and Mac OS X.
This will probably be better than a pure Python solution; no need to reinvent the wheel.
VSCode: delete all comments in a file
Easy way:
- Open extensions (ctrl-shift-x)
- type in
remove comments
in the search box. - Install the top pick and read instructions.
Hard way:
- search replace(ctrl-h)
- toggle regex on (alt-r).
- Learn some regular expressions! https://docs.rs/regex/0.2.5/regex/#syntax
A simple //.*
will match all single line comments (and more ;D). #.*
could be used to match python comments. And /\*[\s\S\n]*\*/
matches block comments. And you can combine them as well: //.*|/\*[\s\S\n]*\*/
(|
in regex means "or", .
means any character, *
means "0 or more" and indicates how many characters to match, therefore .*
means all characters until the end of the line (or until the next matching rule))
Of course with caveats, such as urls (https://...
) has double slashes and will match that first rule, and god knows where there are #
in code that will match that python-rule. So some reading/adjusting has to be done!
Once you start fiddling with your regexes it can take a lifetime to get them perfect, so be careful and go the easy route if you are short on time, but knowing some simple regex by heart will do you good, since regular expressions are usable almost everywhere.
Related Topics
Return Statement VS Exit() in Main()
Why Can't I Initialize Non-Const Static Member or Static Array in Class
Const VS Constexpr on Variables
Detect If Stdin Is a Terminal or Pipe
Why Must the Copy Assignment Operator Return a Reference/Const Reference
How to Sort Two Vectors in the Same Way, With Criteria That Uses Only One of the Vectors
Why Can't Variable Names Start With Numbers
Create Random Number Sequence With No Repeats
Selectively Disable Gcc Warnings For Only Part of a Translation Unit
When Can Outer Braces Be Omitted in an Initializer List
Why Do Constant Expressions Have an Exclusion For Undefined Behavior
How to Perform a Bitwise Operation on Floating Point Numbers
Remove Elements of a Vector Inside the Loop
How to Get Rid of 'Deprecated Conversion from String Constant to 'Char*'' Warnings in Gcc