Strip All HTML Tags, Except Allowed

Strip all HTML tags, except allowed

strip_tags() does exactly this.

Javascript replace regex all html tags except p,a and img

You may match the tags to keep in a capture group and then, using alternation, all other tags. Then replace with $1:

(<\/?(?:a|p|img)[^>]*>)|<[^>]+>

Demo: https://regex101.com/r/Sm4Azv/2

And the JavaScript demo:

var input = 'b<body>b a<a>a h1<h1>h1 p<p>p p</p>p img<img />img';

var output = input.replace(/(<\/?(?:a|p|img)[^>]*>)|<[^>]+>/ig, '$1');

console.log(output);

How can I remove all tags except an allowed list from html parsed by php

As spoken by cpattersonv1 above, you can simply use strip_tags() for the job.

<?php

// strip all other tags except mentioned (p, img, iframe)
$html_result = strip_tags($html, '<p><img><iframe>');

?>

How can I strip html tags except some of them?

According to your comment, you want to remove HTML elements only if they have some class or attribute. You'll need to build up a DOM then:

<?php

$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p><a href="#somewhere">I will be deleted as well</a></p>
<p>But keep this</p>
</div>
DATA;

$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);

$xpath = new DOMXPath($dom);

$elements_to_be_removed = $xpath->query("//*[count(@*)>0]");
foreach ($elements_to_be_removed as $element) {
$element->parentNode->removeChild($element);
}

// just to check
echo $dom->saveHTML();
?>

To change which elements shall be removed, you'll need to change the query, ie to remove all elements with the class myclass, it must read "//*[class='myclass']".

Strip all HTML tags except links

<(?!\/?a(?=>|\s.*>))\/?.*?>

Try this. Had something similar for p tags. Worked for them so don't see why not. Uses negative lookahead to check that it doesn't match a (prefixed with an optional / character) where (using positive lookahead) a (with optional / prefix) is followed by a > or a space, stuff and then >. This then matches up until the next > character. Put this in a subst with

s/<(?!\/?a(?=>|\s.*>))\/?.*?>//g;

This should leave only the opening and closing a tags

Remove html tags except br or br/ tags with javascript

Try This

 function remove_tags(html)
{
var html = html.replace("<br>","||br||");
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
html = tmp.textContent||tmp.innerText;
return html.replace("||br||","<br>");
}

How can I Strip all regular html tags except a /a , img (attributes inside) and br with javascript?

Does this do what you want? http://jsfiddle.net/smerny/r7vhd/

$("body").find("*").not("a,img,br").each(function() {
$(this).replaceWith(this.innerHTML);
});

Basically select everything except a, img, br and replace them with their content.

Strip all HTML tags, except anchor tags

I suggest you use Html Agility Pack

also check this question/answers: HTML Agility Pack strip tags NOT IN whitelist

Remove all html tags except allowed tags using XSLT function

Assuming you can use XSLT 2.0 then you could apply David Carlisle's HTML parser (https://github.com/davidcarlisle/web-xslt/blob/master/htmlparse/htmlparse.xsl) to the contents of body elements and then process the resulting nodes in a mode that strips every element but p elements:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
xmlns:d="data:,dpc"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="d xhtml">

<xsl:import href="htmlparse-by-dcarlisle.xsl"/>

<xsl:template match="@*|node()" mode="#default strip">
<xsl:copy>
<xsl:apply-templates select="@*|node()" mode="#current"/>
</xsl:copy>
</xsl:template>

<xsl:template match="body">
<xsl:copy>
<xsl:apply-templates select="d:htmlparse(., '', true())" mode="strip"/>
</xsl:copy>
</xsl:template>

<xsl:template match="*[not(self::p)]" mode="strip">
<xsl:apply-templates/>
</xsl:template>

</xsl:transform>

For the input

<rss>
<entry>
<body><![CDATA[<p>The Supreme Court issued on Friday a bailable warrant against sitting Calcutta high court justice CS Karnan, an unprecedented order in a bitter confrontation between the judge and the top court.</p><p>A seven-judge bench headed by Chief Justice of India JS Khehar issued the order directing Karnan’s presence on <h2>March 31</h2> because the judge ignored an earlier court order summoning him.<i>Justice Karnan</i> had to appear</p>]]></body>
</entry>
</rss>

that gives

<rss>
<entry>
<body><p>The Supreme Court issued on Friday a bailable warrant against sitting Calcutta high court justice CS Karnan, an unprecedented order in a bitter confrontation between the judge and the top court.</p><p>A seven-judge bench headed by Chief Justice of India JS Khehar issued the order directing Karnan’s presence on March 31 because the judge ignored an earlier court order summoning him.Justice Karnan had to appear</p></body>
</entry>
</rss>

If the input is not escaped but rather contained as XML in the input then you don't need to parse it but can just apply the mode to the contents:

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:template match="@*|node()" mode="#default strip">
<xsl:copy>
<xsl:apply-templates select="@*|node()" mode="#current"/>
</xsl:copy>
</xsl:template>

<xsl:template match="body">
<xsl:copy>
<xsl:apply-templates select="node()" mode="strip"/>
</xsl:copy>
</xsl:template>

<xsl:template match="*[not(self::p)]" mode="strip">
<xsl:apply-templates/>
</xsl:template>

</xsl:transform>

http://xsltransform.net/gWEamMc/1



Related Topics



Leave a reply



Submit