Remove HTML Comments with Regex, in JavaScript

Remove HTML comments with Regex, in Javascript

The regex /<!--[\s\S]*?-->/g should work.

You're going to kill escaping text spans in CDATA blocks.

E.g.

<script><!-- notACommentHere() --></script>

and literal text in formatted code blocks

<xmp>I'm demoing HTML <!-- comments --></xmp>

<textarea><!-- Not a comment either --></textarea>

EDIT:

This also won't prevent new comments from being introduced as in

<!-<!-- A comment -->- not comment text -->

which after one round of that regexp would become

<!-- not comment text -->

If this is a problem, you can escape < that are not part of a comment or tag (complicated to get right) or you can loop and replace as above until the string settles down.


Here's a regex that will match comments including psuedo-comments and unclosed comments per the HTML-5 spec. The CDATA section are only strictly allowed in foreign XML. This suffers the same caveats as above.

var COMMENT_PSEUDO_COMMENT_OR_LT_BANG = new RegExp(
'<!--[\\s\\S]*?(?:-->)?'
+ '<!---+>?' // A comment with no body
+ '|<!(?![dD][oO][cC][tT][yY][pP][eE]|\\[CDATA\\[)[^>]*>?'
+ '|<[?][^>]*>?', // A pseudo-comment
'g');

How to remove html comments from a string in Javascript

like this

var str = `<div></div><!-- some comment --><p></p><!-- some comment -->`str = str.replace(/<\!--.*?-->/g, "");console.log(str)

RegExp to strip HTML comments

Are you just trying to remove the comments? How about

s/<!--[^>]*-->//g

or the slightly better (suggested by the questioner himself):

<!--(.*?)-->

But remember, HTML is not regular, so using regular expressions to parse it will lead you into a world of hurt when somebody throws bizarre edge cases at it.

Regex Javascript - Remove text between two html comments

Regex

Use the following JavaScript Regular Expression to match multiple instances of your custom .html comments and the content inside them:

/\<\!\-\-Delete\-\-\>((.|[\n|\r|\r\n])*?)\<\!\-\-Delete\-\-\>[\n|\r|\r\n]?(\s+)?/g

Then register a custom Function Task inside your Gruntfile.js as shown in the following gist:

Gruntfile.js

module.exports = function (grunt) {

grunt.initConfig({
// ... Any other Tasks
});

grunt.registerTask('processHtmlComments',
'Remove content from inside the custom delete comments',
function() {
var srcDocPath = './src/index.html', // <-- Define src path to .html
outputDocPath = './dist/index.html',// <-- Define dest path for .html

doc = grunt.file.read(srcDocPath, {encoding: 'utf8'}),
re = /\<\!\-\-Delete\-\-\>((.|[\n|\r|\r\n])*?)\<\!\-\-Delete\-\-\>[\n|\r|\r\n]?(\s+)?/g,
contents = doc.replace(re, '');

grunt.file.write(outputDocPath, contents, {encoding: 'utf8'});
console.log('Created file: ' + outputDocPath);
});

grunt.registerTask('default', [
'processHtmlComments'
]);

};

Additional notes

Currently running $ grunt via the CLI does the following:

  1. Reads a file named index.html from the src folder.
  2. Deletes any content from inside the starting and closing custom comments, <!--Delete-->, including the comments themselves.
  3. Writes a new index.html, excluding the unwanted content, to the dist folder.

Both the values for srcDocPath and outputDocPath will probably need to be redefined as per your projects requirements.


EDIT Updated Regex to also allow inline comment usage. For example:

<p>This text remains <!--Delete-->I get deleted<!--Delete-->blah blah</p>

JavaScript RegExp replace HTML comments

This will mark all comments also the one without end tag: <!-- some text -->

<!--[\s\S]*?(?:-->|$)

This will mark all comments also the one without end tag: <!-- some text //-->

<!--[\s\S]*?(?://-->|$)

This will mark everything from the first <!-- to the very end of the file

<!--[\s\S]*?(?:$)     and regex set to `^$ don't match at line breaks`

This will mark everything from the first <!-- to the end of the line

<!--.*

How to remove all conditional HTML comments?

Here are two important facts about (f)lex regular expressions. (See the flex manual for complete documentation of Flex patterns. The section is not very long.)

  1. In (f)lex, the . wildcard matches anything except a newline character. In other words, it is equivalent to [^\n]. So "<!".* will only match to the end of the line. You could fix that by using (.|\n) instead, but see below.

  2. (F)lex does not provide non-greedy repetition (*?). All repetitions extend to the longest possible match. (.*?)--> will therefore match up to the last --> on the line, and (.|\n)*?--> would match up to the last --> in the file.

It is possible to write a regular expression which does what you want, although it's a bit messy:

<!--([^-]|-[^-]|--+[^->])*--+>

should work, as long as the input text does not end with an unterminated comment. (The quotes in your pattern are unnecessary, since none of the quoted characters has any special meaning to (f)lex, but they don't hurt. I left them out because I don't think they contribute to make the pattern less unreadable.)

The repeated sequence matches any of:

  • A character other then -; or
  • A - followed by something other than another -; or
  • Two or more - followed by something other than >.

The last alternative in the repetition might require some explanation. The underlying problem is to avoid problems with inputs like

<!-- Comment with two many dashes --->

If we'd just written the tempting --[^>] as the third alternative, ---> would not be recognised as terminating the pattern, since --- would match --[^>] (a dash is not a right angle bracket) and > would then match [^-], and the scan would continue. Adding the + to match a longer sequence of dashes is not enough, because, like many regex engines, (f)lex is looking for the longest overall match, not the longest submatch in each set of alternatives. So we need to write --+[^->], which cannot match ---.

If that was not clear -- and I can see why it wouldn't be --, you could instead use a start condition to write a much simpler set of patterns:

%x COMMENT
%%
"<!--" { BEGIN(COMMENT); }
<COMMENT>{
"-->" { BEGIN(INITIAL); }
[^-]+ ;
.|\n ;
}

The second <COMMENT> rule is really just an efficiency hack; it avoids triggering a no-op action on every character. With the second rule in place, the last rule really can only match a single -, so it could have been written that way. But writing it in full allows you to remove the second rule and demonstrate to yourself that it works without it.

The key insight for matching the comment in pieces like this is that (f)lex always chooses the longest match, which is in some ways similar to the goal of non-greedy matches. While inside the <COMMENT> start condition, - will only match the single character fallback rule if it cannot be part of the match of -->, which is longer.

How to replace HTML comments with empty string in javascript

You have ! character at wrong place in your regex

/<--!(?:.|\n)*?-->/
|________________ This should be before `--` i.e:- !--

let sourceHTML = document.querySelector(".mySelector").innerHTML;sourceHTML = sourceHTML.replace(/<!--(?:.|\n)*?-->/gm, '');console.log(sourceHTML);
<div class="mySelector">    <!--Main content -->    <p>HTML content...</p></div>

RegEx for match/replacing JavaScript comments (both multiline and inline)

try this,

(\/\*[\w\'\s\r\n\*]*\*\/)|(\/\/[\w\s\']*)|(\<![\-\-\s\w\>\/]*\>)

should work :)
Sample Image



Related Topics



Leave a reply



Submit