Using C# Regular Expressions to Remove HTML Tags

How can i remove HTML Tags from String by REGEX?

try this

// erase html tags from a string
public static string StripHtml(string target)
{
//Regular expression for html tags
Regex StripHTMLExpression = new Regex("<\\S[^><]*>", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.CultureInvariant | RegexOptions.Compiled);

return StripHTMLExpression.Replace(target, string.Empty);
}

call

string htmlString="<div><span>hello world!</span></div>";
string strippedString=StripHtml(htmlString);

Regular expression to remove HTML tags

Using a regular expression to parse HTML is fraught with pitfalls. HTML is not a regular language and hence can't be 100% correctly parsed with a regex. This is just one of many problems you will run into. The best approach is to use an HTML / XML parser to do this for you.

Here is a link to a blog post I wrote awhile back which goes into more details about this problem.

  • http://blogs.msdn.com/b/jaredpar/archive/2008/10/15/regular-expression-limitations.aspx

That being said, here's a solution that should fix this particular problem. It in no way is a perfect solution though.

var pattern = @"<(img|a)[^>]*>(?<content>[^<]*)<";
var regex = new Regex(pattern);
var m = regex.Match(sSummary);
if ( m.Success ) {
sResult = m.Groups["content"].Value;

Regex needed to remove and replace specified html tags in two criteria's using C#

You may use capture groups in your regex and use them in the substitution according to the documentation here: http://msdn.microsoft.com/en-us/library/e7f5w83z

//to remove all h1 tags:
Regex.Replace(html, @"</?h1[^>]*>", "")

//to replace all div tags with p, keeping the same attributes:
Regex.Replace(html, @"(</?)div([^>]*>)", "$1p$2")

//to change the attributes of the div tags you will need two regexes:
//one for the opening tags
Regex.Replace(html, @"<div[^>]*>", "<p class='content'>")
//one for the closing tag
Regex.Replace(html, @"</div>", "</p>")

The last example was added to answer a comment, and the reason why it needs to be two is because the new part of the string (which will be added) is different.

Remove some HTML Tags using RegEx MVC

After several post to I recommend doing the following method for the tags that you want to keep and delete

    string acceptable = "p|a|tr|td|table|html"; //Tags that are a-okay to display
string stringPattern = @"</?(?(?=" + acceptable + @")notag|[a-zA-Z0-9]+)(?:\s[a-zA-Z0-9\-]+=?(?:(["",']?).*?\1?)?)*\s*/?>";
htmlp = Regex.Replace(htmlp, stringPattern, "");

How do I remove all HTML tags from a string without knowing which tags are in it?

You can use a simple regex like this:

public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}

Be aware that this solution has its own flaw. See Remove HTML tags in String for more information (especially the comments of 'Mark E. Haase'/@mehaase)

Another solution would be to use the HTML Agility Pack.

You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?

Remove Certain HTML tags in C#

As far as I can see, you want to remove the HTML elements that contain a style attribute, also remove their closing pairs. Unfortunately, there is no good way to do that with regexes. Without the 'also remove their closing pairs' clause, we could write an approximately good regex.

On the other hand, XSLT is the right tool for this, because it can handle the recursive nature of XML:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="//*[not(@style)]">

<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

</xsl:stylesheet>

What's happening here? The <xsl:template match="//*[not(@style)]"> part matches everything that does not have a style attribute. Then the <xsl:copy>...</xsl:copy> part copies them entirely. I.e. the items that have a style attribute, they will not be copied.

For the record, this is a slight variant of the XSLT identity transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>


Related Topics



Leave a reply



Submit