Remove All Attributes from HTML Tags

Remove all attributes from html tags

Adapted from my answer on a similar question

$text = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';

echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/si",'<$1$2>', $text);

// <p><strong>hello</strong></p>

The RegExp broken down:

/              # Start Pattern
< # Match '<' at beginning of tags
( # Start Capture Group $1 - Tag Name
[a-z] # Match 'a' through 'z'
[a-z0-9]* # Match 'a' through 'z' or '0' through '9' zero or more times
) # End Capture Group
[^>]*? # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
(\/?) # Capture Group $2 - '/' if it is there
> # Match '>'
/is # End Pattern - Case Insensitive & Multi-line ability

Add some quoting, and use the replacement text <$1$2> it should strip any text after the tagname until the end of tag /> or just >.

Please Note This isn't necessarily going to work on ALL input, as the Anti-HTML + RegExp will tell you. There are a few fallbacks, most notably <p style=">"> would end up <p>"> and a few other broken issues... I would recommend looking at Zend_Filter_StripTags as a more full proof tags/attributes filter in PHP

How to remove all attributes from html?

This can be done with Cheerio, as I noted in the comments.

To remove all attributes on all elements, you'd do:

var html = '<p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>';

var $ = cheerio.load(html); // load the HTML

$('*').each(function() { // iterate over all elements
this.attribs = {}; // remove all attributes
});

var html = $.html(); // get the HTML back

Remove all attributes from an HTML element and all its children

As pointed out in this response you can extend removeAttr to take no parameters and delete all attributes.

BEWARE, YOU WILL REMOVE SRC ATTRIBUTE FROM IMAGES INSIDE!!!

Then paired with removeClass (wich already can take no params) and a loop over each element gives this:

var removeAttr = jQuery.fn.removeAttr;jQuery.fn.removeAttr = function() {
if (!arguments.length) { this.each(function() {
// Looping attributes array in reverse direction // to avoid skipping items due to the changing length // when removing them on every iteration. for (var i = this.attributes.length -1; i >= 0 ; i--) { jQuery(this).removeAttr(this.attributes[i].name); } });
return this; }
return removeAttr.apply(this, arguments);};
$('.card_back').find('*').each(function( index, element ) { $(element).removeClass(); $(element).removeAttr();});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><div class="card_wrapper">  <div class="card_navigation">    zurück |    <a title="Titletext" href="/xyz">next</a> </div>  <div class="card_front">    <span class="info">Front</span>    <p>here's just some text      <br>and one more line.    </p>    <p>here's just another text      <br>and one more line.    </p>  </div>  <div class="card_back">    <span class="info">Back</span>    <p class="test"><span id="test3">Lorem Ipsum non dolor <strong>nihil est major</strong>, laudat amemus hibitet</span></p>    <p><span style="color: red">- <strong>Non solum</strong>, sed calucat ebalitant medetur</span></p>    <p> </p>  </div></div>

Remove all attributes in HTML tag except specified with regex

You can achieve this with a negative lookahead, which will tell your expression to either 1. eat one character, or 2. match the special sequence, then rinse and repeat:

<(\w+)\s*(?:(?:(?:(?!class=|id=|name=)[^>]))*((?:class|id|name)=['"][^'"]*['"]\s*)?)+>

Explanation:

  1. <(\w+)\s* (match open of tag and tagname)

  2. (?: (begin enclosure of main construct (note that it doesn't remember matches))

  3. (?:(?:(?!class=|id=|name=)[^>]))* (look ahead for no special token, then eat one character, repeat as many times possible, don't bother to remember anything)

  4. ((?:class|id|name)=['"][^'"]*['"])\s*? (lookahead failed, so special token ahead, let's eat it! note the regular, 'remembering' parens)

  5. )+ (end enclosure of main construct; repeat it, it'll match once for each special token)

  6. > (end of tag)

At this point you might have the matches you need, if your regex flavor supports multiple matches per group. In .NET for example, you'd have something similar to this: $1 = 'a', $2[0]='class="someClass"', $2[1]='id="someId"', etc.

But if you find that only the last match is remembered, you may have to simply repeat the main construct for each token you want to match, like so: (matches will be $1-$4)

<(\w+)\s*(?:(?:(?:(?!class=|id=|name=)[^>]))*((?:class|id|name)=['"][^'"]*['"]\s*)?)(?:(?:(?:(?!class=|id=|name=)[^>]))*((?:class|id|name)=['"][^'"]*['"]\s*)?)(?:(?:(?:(?!class=|id=|name=)[^>]))*((?:class|id|name)=['"][^'"]*['"]\s*)?)[^>]*>

(see it in action here).

Remove attributes from html tags using PHP while keeping specific attributes

You usually should not parse HTML using regular expressions. Instead, in PHP you should call DOMDocument::loadHTML. You can then recurse through the elements in the document and call removeAttribute. Regular expressions for HTML tags are notoriously tricky.

REF: http://php.net/manual/en/domdocument.loadhtml.php

Examples: http://coursesweb.net/php-mysql/html-attributes-php

Here's a solution for you. It will iterate over all tags in the DOM, and remove attributes which are not src or href.

$html_string = "<div class=\"myClass\"><b>This</b> is an <span style=\"margin:20px\">example</span><img src=\"ima.jpg\" /></div>";

$dom = new DOMDocument; // init new DOMDocument
$dom->loadHTML($html_string); // load the HTML
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//@*');
foreach ($nodes as $node) {
if($node->nodeName != "src" && $node->nodeName != "href") {
$node->parentNode->removeAttribute($node->nodeName);
}
}

echo $dom->saveHTML(); // output cleaned HTML

Here is another solution using xPath to filter on attribute names instead:

$dom = new DOMDocument;                 // init new DOMDocument
$dom->loadHTML($html_string); // load the HTML
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//@*[local-name() != 'src' and local-name() != 'href']");
foreach ($nodes as $node) {
$node->parentNode->removeAttribute($node->nodeName);
}

echo $dom->saveHTML(); // output cleaned HTML

Tip: Set the DOM parser to UTF-8 if you are using extended character like this:

$dom->loadHTML(mb_convert_encoding($html_string, 'HTML-ENTITIES', 'UTF-8'));

How to remove all attributes in HTML tags

As an alternative to SelectNodes("//*"), you can use Descendants() which should return the same result :

foreach(var eachNode in HtmlDocument.DocumentNode.Descendants().Where(x => x.NodeType == HtmlNodeType.Element))
{
eachNode.Attributes.RemoveAll();
}

PHP simple html DOM remove all attributes from an html tag

When I use your code and example HTML, it does remove all the attributes from all the <p> tags, even the ones inside <font>, so I'm not sure why yours isn't working.

But it looks like simplehtmldom has methods that specifically deal with attributes so you don't have to use string functions:

$html = file_get_html('page.php');


foreach($html->find('p') as $p) {
foreach ($p->getAllAttributes() as $attr => $val) {
$p->removeAttribute($attr);
}
}
echo $html->innertext;

Hopefully that will be more effective.

How to remove all the attribute and values associated in tags in html

You can use Element.getAttributeNames() to get array of all names and iterate that to remove them

$('#content *').each(function(_, el) {   el.getAttributeNames().forEach(el.removeAttribute.bind(el));});
console.log($('#content')[0].outerHTML)
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><div id="content">  <span id="span" data-span="a" aria-describedby="span">span</span>  <p class="a b c" style="color:black;">paragraph</p></div>

Remove all html attributes with regex (replace)

First of all, I would advise you not to use regexes in this situation, they are not meant to parse tree-shaped structures like HTML.

If you however don't have a choice, I think for the requested problem, you can use a regex.

Looks to me like you forgot spaces, accents, etc. You can use the fact that the greater than > and less than < signs are not allowed as raw text.

/<\s*([a-z][a-z0-9]*)\s.*?>/gi

and call it with:

result = body.replace(regex, '<$1>')

For your given sample, it produces:

<title>Ololo - text’s life</title><div><div><div><div><div><div><div>olololo<ul><li>texttext</li><li>Filter the events lists by host.</li><li>Create graphs for separate hosts and for the groups of hosts.</li></ul><p>bbcvbcvbcvbcvbcvbcvbcvb</p></div></div></div></div></div></div><title>cvbcbcvbcvbcvbccb</title><div></div></div>


Related Topics



Leave a reply



Submit