Remove Style tag in HTML
Regex should be
style\s*=\s*('|")[^\1]*\1
Though I would use Htmlagilitypack
HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);
var elementsWithStyleAttribute = doc.DocumentNode.SelectNodes("//@style");
foreach (var element in elementsWithStyleAttribute)
{
element.Attributes["style"].Remove();
}
doc.Save();
Remove style from HTML Tags using Regex C#
First, as others suggest, an approach using a proper HTML parser is much better. Either use HtmlAgilityPack or CsQuery.
If you really want a regex solution, here it is:
Replace this pattern: (<.+?)\s+style\s*=\s*(["']).*?\2(.*?>)
With: $1$3
Demo: http://regex101.com/r/qJ1vM1/1
To remove multiple attributes, since you're using .NET, this should work:
Replace (?<=<[^<>]+)\s+(?:style|class)\s*=\s*(["']).*?\1
With an empty string
Remove Certain HTML tags in C#
As far as I can see, you want to remove the HTML elements that contain a style
attribute, also remove their closing pairs. Unfortunately, there is no good way to do that with regexes. Without the 'also remove their closing pairs' clause, we could write an approximately good regex.
On the other hand, XSLT
is the right tool for this, because it can handle the recursive nature of XML:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="//*[not(@style)]">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
What's happening here? The <xsl:template match="//*[not(@style)]">
part matches everything that does not have a style
attribute. Then the <xsl:copy>...</xsl:copy>
part copies them entirely. I.e. the items that have a style
attribute, they will not be copied.
For the record, this is a slight variant of the XSLT identity transformation:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Remove inner style from HTML using Regex C#
Use this pattern to match.
<style[^<]*</style\s*>
Explanation:
<style
match<
andstyle
word.[^<]*
match any character which is not<
and this match occur
multiple time till<
occur.</
match exactly</
.style\s*>
matchstyle
word, zero or more space character after it and>
.
How do I remove all HTML tags from a string without knowing which tags are in it?
You can use a simple regex like this:
public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
Be aware that this solution has its own flaw. See Remove HTML tags in String for more information (especially the comments of 'Mark E. Haase'/@mehaase)
Another solution would be to use the HTML Agility Pack.
You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?
remove all inline styles and (most) classes from an HTML string
To anyone interested- I've solved this without using RegEx;
Rather, I used XDocument
to parse the html-
private string MakeHtmlGood(string html)
{
var xmlDoc = XDocument.Parse(html);
// Remove all inline styles
xmlDoc.Descendants().Attributes("style").Remove();
// Remove all classes inserted by 3rd party, without removing our own lovely classes
foreach (var node in xmlDoc.Descendants())
{
var classAttribute = node.Attributes("class").SingleOrDefault();
if (classAttribute == null)
{
continue;
}
var classesThatShouldStay = classAttribute.Value.Split(' ').Where(className => !className.StartsWith("abc"));
classAttribute.SetValue(string.Join(" ", classesThatShouldStay));
}
return xmlDoc.ToString();
}
Related Topics
Enter Key Pressed Event Handler
How to Upload File Using Ajax.Beginform() Asynchronously
How to Get Data by SQLdatareader.Getvalue by Column Name
Auto Create Database Tables from Objects, Entity Framework
How to Implement the Equivalent of SQL In() Using .Net
Is There a Standard C++ Equivalent of Ienumerable<T> in C#
How to Dllexport a C++ Class for Use in a C# Application
Linq: How to Exclude Condition If Parameter Is Null
Xamarin Android Alarm Manager Issue
String List in SQLcommand Through Parameters in C#
How to Call a .Net Assembly from C/C++
Use or Clause in Queryover in Nhibernate
C# Generics Compared to C++ Templates
Add the Where Clause Dynamically in Entity Framework
How to Change The Colour of The Line Below/Border of a Textbox (Entry)