Removing All Script Tags from HTML With Js Regular Expression

Removing all script tags from html with JS Regular Expression

Attempting to remove HTML markup using a regular expression is problematic. You don't know what's in there as script or attribute values. One way is to insert it as the innerHTML of a div, remove any script elements and return the innerHTML, e.g.

  function stripScripts(s) {
var div = document.createElement('div');
div.innerHTML = s;
var scripts = div.getElementsByTagName('script');
var i = scripts.length;
while (i--) {
scripts[i].parentNode.removeChild(scripts[i]);
}
return div.innerHTML;
}

alert(
stripScripts('<span><script type="text/javascript">alert(\'foo\');<\/script><\/span>')
);

Note that at present, browsers will not execute the script if inserted using the innerHTML property, and likely never will especially as the element is not added to the document.

Remove all scripts with javascript regex

Try this:

(/<.*?script.*?>.*?<\/.*?script.*?>/igm, '')

or

(/<script.*?>.*?<\/script>/igm, '')

(you need 'm' switch to search multi-line)

Regex To Remove Script And Style Tags + Content Javascript

I want to have the equivalent of
<script(.*?)>(.*?)</script> //in javascript

/<script([\S\s]*?)>([\S\s]*?)<\/script>/ig

Use [\S\s]*? instead of .*? in your regex because javascript won't support s modifier (DOTALL modifier). [\S\s]*? would match any space or non-space character zero or more times non-greedily.

How to remove all script tags from html file

If I understood correctly your question, and you want to delete everything inside <script></script>, I think you have to split the sed in parts (You can do it one-liner with ;):

Using:

sed 's/<script>.*<\/script>//g;/<script>/,/<\/script>/{/<script>/!{/<\/script>/!d}};s/<script>.*//g;s/.*<\/script>//g'

The first piece (s/<script>.*<\/script>//g) will work for them when in one line;

The second section (/<script>/,/<\/script>/{/<script>/!{/<\/script>/!d}}) is almost a quote to @akingokay answer, only that I excluded the lines of occurrence (Just in case they have something before or after). Great explanation of that in here Using sed to delete all lines between two matching patterns;

The last two (s/<script>.*//g and s/.*<\/script>//g) finally take care of the lines that start and don't finish or don't start and finish.

Now if you have an index.html that has:

<html>
<body>
foo
<script> console.log("bar) </script>
<div id="something"></div>
<script>
// Multiple Lines script
// Blah blah
</script>
foo <script> //Some
console.log("script")</script> bar
</body>
</html>

and you run this sed command, you will get:

cat index.html | sed 's/<script>.*<\/script>//g;/<script>/,/<\/script>/{/<script>/!{/<\/script>/!d}};s/<script>.*//g;s/.*<\/script>//g'
<html>
<body>
foo


<div id="something"></div>




foo
bar
</body>

</html>

Finally you will have a lot of blank spaces, but the code should work as expected. Of course you could easily remove them with sed as well.

Hope it helps.

PS: I think that @l0b0 is right, and this is not the correct tool.



Related Topics



Leave a reply



Submit