How can I strip HTML tags from a string in ASP.NET?
If it is just stripping all HTML tags from a string, this works reliably with regex as well. Replace:
<[^>]*(>|$)
with the empty string, globally. Don't forget to normalize the string afterwards, replacing:
[\s\r\n]+
with a single space, and trimming the result. Optionally replace any HTML character entities back to the actual characters.
Note:
- There is a limitation: HTML and XML allow
>
in attribute values. This solution will return broken markup when encountering such values. - The solution is technically safe, as in: The result will never contain anything that could be used to do cross site scripting or to break a page layout. It is just not very clean.
- As with all things HTML and regex:
Use a proper parser if you must get it right under all circumstances.
How to remove html tags from string in view page C#
If you want to show your content without any formatting then you can use this Regex.Replace(input, "<.*?>", String.Empty)
to strip all of Html tags from your string.
1) Add below code to top of view (.cshtml
).
@using System.Text.RegularExpressions;
@helper StripHTML(string input)
{
if (!string.IsNullOrEmpty(input))
{
input = Regex.Replace(input, "<.*?>", String.Empty);
<span>@input</span>
}
}
2) Use the above helper function like
<td>@StripHTML(item.Message)</td>
How do I remove all HTML tags from a string without knowing which tags are in it?
You can use a simple regex like this:
public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
Be aware that this solution has its own flaw. See Remove HTML tags in String for more information (especially the comments of 'Mark E. Haase'/@mehaase)
Another solution would be to use the HTML Agility Pack.
You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?
How can I strip any and all HTML tags from a string?
Try a regex replacement.
This pattern matches html tags within a string. From here
var pattern = @"</?\w+((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?>";
var source = "<pre> (Refer to business office for guidance and explain below the circumstances for exception to policy or attach a copy of request)</pre>";
Regex.Replace(source, pattern, string.Empty);
How Can I strip HTML from Text in .NET?
I downloaded the HtmlAgilityPack and created this function:
string StripHtml(string html)
{
// create whitespace between html elements, so that words do not run together
html = html.Replace(">","> ");
// parse html
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
// strip html decoded text from html
string text = HttpUtility.HtmlDecode(doc.DocumentNode.InnerText);
// replace all whitespace with a single space and remove leading and trailing whitespace
return Regex.Replace(text, @"\s+", " ").Trim();
}
Remove html tags from a string except a in asp.net
VERY VERY hacky (and really shouldn't be used productionally) but:
C#
Regex.Replace(input, @"<[^>]+?\/?>", m => {
// here you can exclude specific tags such as `<a>` or maybe `<b>`, etc.
return Regex.IsMatch(m.Value, @"^<a\b|\/a>$") ? m.Value : String.Empty;
});
Basically, it just takes out every HTML code with the exception of <a ...>...</a>
.
Note: this DOES NOT:
- Validate if a tag was opened/closed/nested correctly.
- Validate if the
<>
are actually HTML tags (maybe your input has<
or>
in the text itself?) - Handle "nested"
<>
tags. (e.g.<img src="http://placeholde.it/100" alt="foo<Bar>"/>
will leave a remainder of"/>
in the output string)
Here's the same thing turned in to a helper method:
// Mocks http://www.php.net/strip_tags
/// <summary>
/// Removed all HTML tags from the string and returned the purified result.
/// If supplied, tags matching <paramref name="allowedTags"/> will be left untouched.
/// </summary>
/// <param name="input">The input string.</param>
/// <param name="allowedTags">Tags to remain in the original input.</param>
/// <returns>Transformed input string.</returns>
static String StripTags(String input, params String[] allowedTags)
{
if (String.IsNullOrEmpty(input)) return input;
MatchEvaluator evaluator = m => String.Empty;
if (allowedTags != null && allowedTags.Length > 0)
{
Regex reAllowed = new Regex(String.Format(@"^<(?:{0})\b|\/(?:{0})>$", String.Join("|", allowedTags.Select(x => Regex.Escape(x)).ToArray())));
evaluator = m => reAllowed.IsMatch(m.Value) ? m.Value : String.Empty;
}
return Regex.Replace(input, @"<[^>]+?\/?>", evaluator);
}
// StripTags(input) -- all tags are removed
// StripTags(input, "a") -- all tags but <a> are removed
// StripTags(input, new[]{ "a" }) -- same as above
How to remove html tags from string in c#
Use library such as HTML Agility Pack.
You may use Regular Expressions but I wouldn't recommend it.
EDIT: Here's the sample code that does html to text conversion - http://htmlagilitypack.codeplex.com/SourceControl/changeset/view/66017#1336937
Related Topics
Understanding Garbage Collection in .Net
How to Do Impersonation in .Net
Using .Net, How to Find the Mime Type of a File Based on the File Signature Not the Extension
How to Display a Decimal Value to 2 Decimal Places
In C#, Why Is String a Reference Type That Behaves Like a Value Type
How to Evaluate C# Code Dynamically
How to Modify a List in a 'Foreach' Loop
Transactionscope Automatically Escalating to Msdtc on Some Machines
C# Json.Net Convention That Follows Ruby Property Naming Conventions
How to Convert a Hex String to a Byte Array
Automating the Invokerequired Code Pattern