How to Remove Attributes from an HTML Tag

Remove all attributes from html tags

Adapted from my answer on a similar question

$text = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';

echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/si",'<$1$2>', $text);

// <p><strong>hello</strong></p>

The RegExp broken down:

/              # Start Pattern
< # Match '<' at beginning of tags
( # Start Capture Group $1 - Tag Name
[a-z] # Match 'a' through 'z'
[a-z0-9]* # Match 'a' through 'z' or '0' through '9' zero or more times
) # End Capture Group
[^>]*? # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
(\/?) # Capture Group $2 - '/' if it is there
> # Match '>'
/is # End Pattern - Case Insensitive & Multi-line ability

Add some quoting, and use the replacement text <$1$2> it should strip any text after the tagname until the end of tag /> or just >.

Please Note This isn't necessarily going to work on ALL input, as the Anti-HTML + RegExp will tell you. There are a few fallbacks, most notably <p style=">"> would end up <p>"> and a few other broken issues... I would recommend looking at Zend_Filter_StripTags as a more full proof tags/attributes filter in PHP

How to remove all attributes from html?

This can be done with Cheerio, as I noted in the comments.

To remove all attributes on all elements, you'd do:

var html = '<p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>';

var $ = cheerio.load(html); // load the HTML

$('*').each(function() { // iterate over all elements
this.attribs = {}; // remove all attributes
});

var html = $.html(); // get the HTML back

Remove attributes from html tags using PHP while keeping specific attributes

You usually should not parse HTML using regular expressions. Instead, in PHP you should call DOMDocument::loadHTML. You can then recurse through the elements in the document and call removeAttribute. Regular expressions for HTML tags are notoriously tricky.

REF: http://php.net/manual/en/domdocument.loadhtml.php

Examples: http://coursesweb.net/php-mysql/html-attributes-php

Here's a solution for you. It will iterate over all tags in the DOM, and remove attributes which are not src or href.

$html_string = "<div class=\"myClass\"><b>This</b> is an <span style=\"margin:20px\">example</span><img src=\"ima.jpg\" /></div>";

$dom = new DOMDocument; // init new DOMDocument
$dom->loadHTML($html_string); // load the HTML
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//@*');
foreach ($nodes as $node) {
if($node->nodeName != "src" && $node->nodeName != "href") {
$node->parentNode->removeAttribute($node->nodeName);
}
}

echo $dom->saveHTML(); // output cleaned HTML

Here is another solution using xPath to filter on attribute names instead:

$dom = new DOMDocument;                 // init new DOMDocument
$dom->loadHTML($html_string); // load the HTML
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//@*[local-name() != 'src' and local-name() != 'href']");
foreach ($nodes as $node) {
$node->parentNode->removeAttribute($node->nodeName);
}

echo $dom->saveHTML(); // output cleaned HTML

Tip: Set the DOM parser to UTF-8 if you are using extended character like this:

$dom->loadHTML(mb_convert_encoding($html_string, 'HTML-ENTITIES', 'UTF-8'));

remove functions and attributes from html code

It's much easier to do it by cloning and then removing them from the clone before using innerHTML:

var content = cleanContent(document.getElementById("container").cloneNode(true)).innerHTML;

Where clean is something like:

function clean(elm) {
for (const key in elm) {
if (key.startsWith("on")) {
elm.removeAttribute(key);
}
}
elm.contentEditable = false;
Array.from(elm.children).forEach(clean);
return elm;
}

Live Example:

function clean(elm) {
for (const key in elm) {
if (key.startsWith("on")) {
elm.removeAttribute(key);
}
}
elm.contentEditable = false;
Array.from(elm.children).forEach(clean);
return elm;
}
var content = clean(document.getElementById("container").cloneNode(true)).innerHTML;
document.getElementById("container").innerHTML = content;
<div id="container">
<p contenteditable="false">Hello World</p>
<button onclick="alert('x');">Button</button>
</div>

removing html attributes from an html string value using regex

There's quite a lot of literature out there on why parsing HTML with regex can be quite risky – this famous StackOverflow question is a good example.

As @Polymer has pointed out, your current regex will miss attributes with single quotes, but there are other possibilities too: data attributes – e.g data-id="233" will be missed, and also non-quote attributes, like disabled. There could be more!

You can end up always being on catch-up with this approach, always having to change your regex as you encounter new combinations in your HTML.

A safer approach might be to use the DOMParser method to parse your string as HTML, and extract the contents from it that way:

let stringhtml = '<div class="Paragraph  BCX0 SCXW244271589" paraid="1364880375" paraeid="{8e523337-60c9-4b0d-8c73-fb1a70a2ba58}{165}" style="margin-bottom: 0px;margin-left:96px;padding:0px;user-select:text;-webkit-user-drag:none;-webkit-tap-highlight-color:transparent; overflow-wrap: break-word;">some text</div>'

let parser = new DOMParser();
let parsedResult = parser.parseFromString(stringhtml, 'text/html');

let element = document.createElement(parsedResult.body.firstChild.tagName);

element.innerText = parsedResult.documentElement.textContent;

console.log(element);

Remove all attributes in HTML tag except specified with regex

You can achieve this with a negative lookahead, which will tell your expression to either 1. eat one character, or 2. match the special sequence, then rinse and repeat:

<(\w+)\s*(?:(?:(?:(?!class=|id=|name=)[^>]))*((?:class|id|name)=['"][^'"]*['"]\s*)?)+>

Explanation:

  1. <(\w+)\s* (match open of tag and tagname)

  2. (?: (begin enclosure of main construct (note that it doesn't remember matches))

  3. (?:(?:(?!class=|id=|name=)[^>]))* (look ahead for no special token, then eat one character, repeat as many times possible, don't bother to remember anything)

  4. ((?:class|id|name)=['"][^'"]*['"])\s*? (lookahead failed, so special token ahead, let's eat it! note the regular, 'remembering' parens)

  5. )+ (end enclosure of main construct; repeat it, it'll match once for each special token)

  6. > (end of tag)

At this point you might have the matches you need, if your regex flavor supports multiple matches per group. In .NET for example, you'd have something similar to this: $1 = 'a', $2[0]='class="someClass"', $2[1]='id="someId"', etc.

But if you find that only the last match is remembered, you may have to simply repeat the main construct for each token you want to match, like so: (matches will be $1-$4)

<(\w+)\s*(?:(?:(?:(?!class=|id=|name=)[^>]))*((?:class|id|name)=['"][^'"]*['"]\s*)?)(?:(?:(?:(?!class=|id=|name=)[^>]))*((?:class|id|name)=['"][^'"]*['"]\s*)?)(?:(?:(?:(?!class=|id=|name=)[^>]))*((?:class|id|name)=['"][^'"]*['"]\s*)?)[^>]*>

(see it in action here).

How to remove all the attribute and values associated in tags in html

You can use Element.getAttributeNames() to get array of all names and iterate that to remove them

$('#content *').each(function(_, el) {   el.getAttributeNames().forEach(el.removeAttribute.bind(el));});
console.log($('#content')[0].outerHTML)
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><div id="content">  <span id="span" data-span="a" aria-describedby="span">span</span>  <p class="a b c" style="color:black;">paragraph</p></div>

Remove attribute of HTML tag

To remvove it from literally the first element use .removeAttr():

$(":first").removeAttr("style");

or in this case .show() will show the element by removing the display property:

$(":first").show();

Though you probably want to narrow it down to inside something else, for example:

$("#container :first").removeAttr("style");

If you want to show the first hidden one, use :hidden as your selector:

$(":hidden:first").show();

Remove all attributes from an HTML element and all its children

As pointed out in this response you can extend removeAttr to take no parameters and delete all attributes.

BEWARE, YOU WILL REMOVE SRC ATTRIBUTE FROM IMAGES INSIDE!!!

Then paired with removeClass (wich already can take no params) and a loop over each element gives this:

var removeAttr = jQuery.fn.removeAttr;jQuery.fn.removeAttr = function() {
if (!arguments.length) { this.each(function() {
// Looping attributes array in reverse direction // to avoid skipping items due to the changing length // when removing them on every iteration. for (var i = this.attributes.length -1; i >= 0 ; i--) { jQuery(this).removeAttr(this.attributes[i].name); } });
return this; }
return removeAttr.apply(this, arguments);};
$('.card_back').find('*').each(function( index, element ) { $(element).removeClass(); $(element).removeAttr();});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><div class="card_wrapper">  <div class="card_navigation">    zurück |    <a title="Titletext" href="/xyz">next</a> </div>  <div class="card_front">    <span class="info">Front</span>    <p>here's just some text      <br>and one more line.    </p>    <p>here's just another text      <br>and one more line.    </p>  </div>  <div class="card_back">    <span class="info">Back</span>    <p class="test"><span id="test3">Lorem Ipsum non dolor <strong>nihil est major</strong>, laudat amemus hibitet</span></p>    <p><span style="color: red">- <strong>Non solum</strong>, sed calucat ebalitant medetur</span></p>    <p> </p>  </div></div>


Related Topics



Leave a reply



Submit