How can i remove HTML Tags from String by REGEX?
try this
// erase html tags from a string
public static string StripHtml(string target)
{
//Regular expression for html tags
Regex StripHTMLExpression = new Regex("<\\S[^><]*>", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.CultureInvariant | RegexOptions.Compiled);
return StripHTMLExpression.Replace(target, string.Empty);
}
call
string htmlString="<div><span>hello world!</span></div>";
string strippedString=StripHtml(htmlString);
Regular expression to remove HTML tags
Using a regular expression to parse HTML is fraught with pitfalls. HTML is not a regular language and hence can't be 100% correctly parsed with a regex. This is just one of many problems you will run into. The best approach is to use an HTML / XML parser to do this for you.
Here is a link to a blog post I wrote awhile back which goes into more details about this problem.
- http://blogs.msdn.com/b/jaredpar/archive/2008/10/15/regular-expression-limitations.aspx
That being said, here's a solution that should fix this particular problem. It in no way is a perfect solution though.
var pattern = @"<(img|a)[^>]*>(?<content>[^<]*)<";
var regex = new Regex(pattern);
var m = regex.Match(sSummary);
if ( m.Success ) {
sResult = m.Groups["content"].Value;
Regex needed to remove and replace specified html tags in two criteria's using C#
You may use capture groups in your regex and use them in the substitution according to the documentation here: http://msdn.microsoft.com/en-us/library/e7f5w83z
//to remove all h1 tags:
Regex.Replace(html, @"</?h1[^>]*>", "")
//to replace all div tags with p, keeping the same attributes:
Regex.Replace(html, @"(</?)div([^>]*>)", "$1p$2")
//to change the attributes of the div tags you will need two regexes:
//one for the opening tags
Regex.Replace(html, @"<div[^>]*>", "<p class='content'>")
//one for the closing tag
Regex.Replace(html, @"</div>", "</p>")
The last example was added to answer a comment, and the reason why it needs to be two is because the new part of the string (which will be added) is different.
Remove some HTML Tags using RegEx MVC
After several post to I recommend doing the following method for the tags that you want to keep and delete
string acceptable = "p|a|tr|td|table|html"; //Tags that are a-okay to display
string stringPattern = @"</?(?(?=" + acceptable + @")notag|[a-zA-Z0-9]+)(?:\s[a-zA-Z0-9\-]+=?(?:(["",']?).*?\1?)?)*\s*/?>";
htmlp = Regex.Replace(htmlp, stringPattern, "");
How do I remove all HTML tags from a string without knowing which tags are in it?
You can use a simple regex like this:
public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
Be aware that this solution has its own flaw. See Remove HTML tags in String for more information (especially the comments of 'Mark E. Haase'/@mehaase)
Another solution would be to use the HTML Agility Pack.
You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?
Remove Certain HTML tags in C#
As far as I can see, you want to remove the HTML elements that contain a style
attribute, also remove their closing pairs. Unfortunately, there is no good way to do that with regexes. Without the 'also remove their closing pairs' clause, we could write an approximately good regex.
On the other hand, XSLT
is the right tool for this, because it can handle the recursive nature of XML:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="//*[not(@style)]">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
What's happening here? The <xsl:template match="//*[not(@style)]">
part matches everything that does not have a style
attribute. Then the <xsl:copy>...</xsl:copy>
part copies them entirely. I.e. the items that have a style
attribute, they will not be copied.
For the record, this is a slight variant of the XSLT identity transformation:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Related Topics
How to Find the Text Within a Div in the Source of a Web Page Using C#
Send Http Post Request in .Net
An Async/Await Example That Causes a Deadlock
How to Find the Method That Called the Current Method
Combining Two Expressions (Expression≪Func≪T, Bool≫≫)
How to Get a Textbox to Only Accept Numeric Input in Wpf
Best Practice to Call Configureawait For All Server-Side Code
How to Add Extension Methods to an Existing Static Class
Ef Codefirst: Should I Initialize Navigation Properties
How to Convert HTML to Text in C#
What Does the [Flags] Enum Attribute Mean in C#
Embedding Dlls in a Compiled Executable
How to Stop Backgroundworker on Form'S Closing Event
Typenamehandling Caution in Newtonsoft Json
Using Setwindowpos With Multiple Monitors
How to Await an Event Instead of Another Async Method
Best Way to Implement Keyboard Shortcuts in a Windows Forms Application