How to Remove Specific Style Tag in HTML Using C#

Remove Style tag in HTML

Regex should be

 style\s*=\s*('|")[^\1]*\1

Though I would use Htmlagilitypack

   HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);
var elementsWithStyleAttribute = doc.DocumentNode.SelectNodes("//@style");
foreach (var element in elementsWithStyleAttribute)
{
element.Attributes["style"].Remove();
}
doc.Save();

Remove style from HTML Tags using Regex C#

First, as others suggest, an approach using a proper HTML parser is much better. Either use HtmlAgilityPack or CsQuery.

If you really want a regex solution, here it is:

Replace this pattern: (<.+?)\s+style\s*=\s*(["']).*?\2(.*?>)
With: $1$3

Demo: http://regex101.com/r/qJ1vM1/1


To remove multiple attributes, since you're using .NET, this should work:

Replace (?<=<[^<>]+)\s+(?:style|class)\s*=\s*(["']).*?\1
With an empty string

Remove Certain HTML tags in C#

As far as I can see, you want to remove the HTML elements that contain a style attribute, also remove their closing pairs. Unfortunately, there is no good way to do that with regexes. Without the 'also remove their closing pairs' clause, we could write an approximately good regex.

On the other hand, XSLT is the right tool for this, because it can handle the recursive nature of XML:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="//*[not(@style)]">

<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

</xsl:stylesheet>

What's happening here? The <xsl:template match="//*[not(@style)]"> part matches everything that does not have a style attribute. Then the <xsl:copy>...</xsl:copy> part copies them entirely. I.e. the items that have a style attribute, they will not be copied.

For the record, this is a slight variant of the XSLT identity transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

Remove inner style from HTML using Regex C#

Use this pattern to match.

<style[^<]*</style\s*>

Explanation:

  • <style match < and style word.
  • [^<]* match any character which is not < and this match occur
    multiple time till < occur.
  • </ match exactly </.
  • style\s*> match style word, zero or more space character after it and
    >.

How do I remove all HTML tags from a string without knowing which tags are in it?

You can use a simple regex like this:

public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}

Be aware that this solution has its own flaw. See Remove HTML tags in String for more information (especially the comments of 'Mark E. Haase'/@mehaase)

Another solution would be to use the HTML Agility Pack.

You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?

remove all inline styles and (most) classes from an HTML string

To anyone interested- I've solved this without using RegEx;

Rather, I used XDocument to parse the html-

private string MakeHtmlGood(string html)
{
var xmlDoc = XDocument.Parse(html);
// Remove all inline styles
xmlDoc.Descendants().Attributes("style").Remove();

// Remove all classes inserted by 3rd party, without removing our own lovely classes
foreach (var node in xmlDoc.Descendants())
{
var classAttribute = node.Attributes("class").SingleOrDefault();
if (classAttribute == null)
{
continue;
}
var classesThatShouldStay = classAttribute.Value.Split(' ').Where(className => !className.StartsWith("abc"));
classAttribute.SetValue(string.Join(" ", classesThatShouldStay));

}

return xmlDoc.ToString();
}


Related Topics



Leave a reply



Submit