Remove all attributes from html tags
Adapted from my answer on a similar question
$text = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';
echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/si",'<$1$2>', $text);
// <p><strong>hello</strong></p>
The RegExp broken down:
/ # Start Pattern
< # Match '<' at beginning of tags
( # Start Capture Group $1 - Tag Name
[a-z] # Match 'a' through 'z'
[a-z0-9]* # Match 'a' through 'z' or '0' through '9' zero or more times
) # End Capture Group
[^>]*? # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
(\/?) # Capture Group $2 - '/' if it is there
> # Match '>'
/is # End Pattern - Case Insensitive & Multi-line ability
Add some quoting, and use the replacement text <$1$2>
it should strip any text after the tagname until the end of tag />
or just >
.
Please Note This isn't necessarily going to work on ALL input, as the Anti-HTML + RegExp will tell you. There are a few fallbacks, most notably <p style=">">
would end up <p>">
and a few other broken issues... I would recommend looking at Zend_Filter_StripTags as a more full proof tags/attributes filter in PHP
How to remove all attributes from html?
This can be done with Cheerio, as I noted in the comments.
To remove all attributes on all elements, you'd do:
var html = '<p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>';
var $ = cheerio.load(html); // load the HTML
$('*').each(function() { // iterate over all elements
this.attribs = {}; // remove all attributes
});
var html = $.html(); // get the HTML back
Remove attributes from html tags using PHP while keeping specific attributes
You usually should not parse HTML using regular expressions. Instead, in PHP you should call DOMDocument::loadHTML
. You can then recurse through the elements in the document and call removeAttribute
. Regular expressions for HTML tags are notoriously tricky.
REF: http://php.net/manual/en/domdocument.loadhtml.php
Examples: http://coursesweb.net/php-mysql/html-attributes-php
Here's a solution for you. It will iterate over all tags in the DOM, and remove attributes which are not src
or href
.
$html_string = "<div class=\"myClass\"><b>This</b> is an <span style=\"margin:20px\">example</span><img src=\"ima.jpg\" /></div>";
$dom = new DOMDocument; // init new DOMDocument
$dom->loadHTML($html_string); // load the HTML
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//@*');
foreach ($nodes as $node) {
if($node->nodeName != "src" && $node->nodeName != "href") {
$node->parentNode->removeAttribute($node->nodeName);
}
}
echo $dom->saveHTML(); // output cleaned HTML
Here is another solution using xPath to filter on attribute names instead:
$dom = new DOMDocument; // init new DOMDocument
$dom->loadHTML($html_string); // load the HTML
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//@*[local-name() != 'src' and local-name() != 'href']");
foreach ($nodes as $node) {
$node->parentNode->removeAttribute($node->nodeName);
}
echo $dom->saveHTML(); // output cleaned HTML
Tip: Set the DOM parser to UTF-8 if you are using extended character like this:
$dom->loadHTML(mb_convert_encoding($html_string, 'HTML-ENTITIES', 'UTF-8'));
remove functions and attributes from html code
It's much easier to do it by cloning and then removing them from the clone before using innerHTML
:
var content = cleanContent(document.getElementById("container").cloneNode(true)).innerHTML;
Where clean
is something like:
function clean(elm) {
for (const key in elm) {
if (key.startsWith("on")) {
elm.removeAttribute(key);
}
}
elm.contentEditable = false;
Array.from(elm.children).forEach(clean);
return elm;
}
Live Example:
function clean(elm) {
for (const key in elm) {
if (key.startsWith("on")) {
elm.removeAttribute(key);
}
}
elm.contentEditable = false;
Array.from(elm.children).forEach(clean);
return elm;
}
var content = clean(document.getElementById("container").cloneNode(true)).innerHTML;
document.getElementById("container").innerHTML = content;
<div id="container">
<p contenteditable="false">Hello World</p>
<button onclick="alert('x');">Button</button>
</div>
removing html attributes from an html string value using regex
There's quite a lot of literature out there on why parsing HTML with regex can be quite risky – this famous StackOverflow question is a good example.
As @Polymer has pointed out, your current regex will miss attributes with single quotes, but there are other possibilities too: data
attributes – e.g data-id="233"
will be missed, and also non-quote attributes, like disabled
. There could be more!
You can end up always being on catch-up with this approach, always having to change your regex as you encounter new combinations in your HTML.
A safer approach might be to use the DOMParser
method to parse your string as HTML, and extract the contents from it that way:
let stringhtml = '<div class="Paragraph BCX0 SCXW244271589" paraid="1364880375" paraeid="{8e523337-60c9-4b0d-8c73-fb1a70a2ba58}{165}" style="margin-bottom: 0px;margin-left:96px;padding:0px;user-select:text;-webkit-user-drag:none;-webkit-tap-highlight-color:transparent; overflow-wrap: break-word;">some text</div>'
let parser = new DOMParser();
let parsedResult = parser.parseFromString(stringhtml, 'text/html');
let element = document.createElement(parsedResult.body.firstChild.tagName);
element.innerText = parsedResult.documentElement.textContent;
console.log(element);
Remove all attributes in HTML tag except specified with regex
You can achieve this with a negative lookahead, which will tell your expression to either 1. eat one character, or 2. match the special sequence, then rinse and repeat:
<(\w+)\s*(?:(?:(?:(?!class=|id=|name=)[^>]))*((?:class|id|name)=['"][^'"]*['"]\s*)?)+>
Explanation:
<(\w+)\s*
(match open of tag and tagname)(?:
(begin enclosure of main construct (note that it doesn't remember matches))(?:(?:(?!class=|id=|name=)[^>]))*
(look ahead for no special token, then eat one character, repeat as many times possible, don't bother to remember anything)((?:class|id|name)=['"][^'"]*['"])\s*?
(lookahead failed, so special token ahead, let's eat it! note the regular, 'remembering' parens))+
(end enclosure of main construct; repeat it, it'll match once for each special token)>
(end of tag)
At this point you might have the matches you need, if your regex flavor supports multiple matches per group. In .NET for example, you'd have something similar to this: $1 = 'a', $2[0]='class="someClass"', $2[1]='id="someId"', etc.
But if you find that only the last match is remembered, you may have to simply repeat the main construct for each token you want to match, like so: (matches will be $1-$4)
<(\w+)\s*(?:(?:(?:(?!class=|id=|name=)[^>]))*((?:class|id|name)=['"][^'"]*['"]\s*)?)(?:(?:(?:(?!class=|id=|name=)[^>]))*((?:class|id|name)=['"][^'"]*['"]\s*)?)(?:(?:(?:(?!class=|id=|name=)[^>]))*((?:class|id|name)=['"][^'"]*['"]\s*)?)[^>]*>
(see it in action here).
How to remove all the attribute and values associated in tags in html
You can use Element.getAttributeNames()
to get array of all names and iterate that to remove them
$('#content *').each(function(_, el) { el.getAttributeNames().forEach(el.removeAttribute.bind(el));});
console.log($('#content')[0].outerHTML)
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><div id="content"> <span id="span" data-span="a" aria-describedby="span">span</span> <p class="a b c" style="color:black;">paragraph</p></div>
Remove attribute of HTML tag
To remvove it from literally the first element use .removeAttr()
:
$(":first").removeAttr("style");
or in this case .show()
will show the element by removing the display
property:
$(":first").show();
Though you probably want to narrow it down to inside something else, for example:
$("#container :first").removeAttr("style");
If you want to show the first hidden one, use :hidden
as your selector:
$(":hidden:first").show();
Remove all attributes from an HTML element and all its children
As pointed out in this response you can extend removeAttr to take no parameters and delete all attributes.
BEWARE, YOU WILL REMOVE SRC ATTRIBUTE FROM IMAGES INSIDE!!!
Then paired with removeClass (wich already can take no params) and a loop over each element gives this:
var removeAttr = jQuery.fn.removeAttr;jQuery.fn.removeAttr = function() {
if (!arguments.length) { this.each(function() {
// Looping attributes array in reverse direction // to avoid skipping items due to the changing length // when removing them on every iteration. for (var i = this.attributes.length -1; i >= 0 ; i--) { jQuery(this).removeAttr(this.attributes[i].name); } });
return this; }
return removeAttr.apply(this, arguments);};
$('.card_back').find('*').each(function( index, element ) { $(element).removeClass(); $(element).removeAttr();});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><div class="card_wrapper"> <div class="card_navigation"> zurück | <a title="Titletext" href="/xyz">next</a> </div> <div class="card_front"> <span class="info">Front</span> <p>here's just some text <br>and one more line. </p> <p>here's just another text <br>and one more line. </p> </div> <div class="card_back"> <span class="info">Back</span> <p class="test"><span id="test3">Lorem Ipsum non dolor <strong>nihil est major</strong>, laudat amemus hibitet</span></p> <p><span style="color: red">- <strong>Non solum</strong>, sed calucat ebalitant medetur</span></p> <p> </p> </div></div>
Related Topics
Preg Match Text in PHP Between HTML Tags
MySQL Error "Too Many Connections"
What Is a Good Parser Generator for PHP
How to Use Special Characters in Recipients Name When Using PHP's Mail Function
Parse Error: Syntax Error, Unexpected '[', Expecting ')'
Executing PHP Code Inside a .Js File
How to Set Selected Value of HTML Select Box with PHP
PHP Split Array into Smaller Even Arrays
How to Embed Images in a Single HTML/PHP File
Update Query with Pdo and MySQL
For Loop VS While Loop VS Foreach Loop PHP
Get MySQL Query Results as Their Native Data Type
Access Post Values in Symfony2 Request Object
Will Xpath 2.0 And/Or Xslt 2.0 Be Implemented in PHP