How to Strip HTML Tags from a String in Asp.Net

How can I strip HTML tags from a string in ASP.NET?

If it is just stripping all HTML tags from a string, this works reliably with regex as well. Replace:

<[^>]*(>|$)

with the empty string, globally. Don't forget to normalize the string afterwards, replacing:

[\s\r\n]+

with a single space, and trimming the result. Optionally replace any HTML character entities back to the actual characters.

Note:

  1. There is a limitation: HTML and XML allow > in attribute values. This solution will return broken markup when encountering such values.
  2. The solution is technically safe, as in: The result will never contain anything that could be used to do cross site scripting or to break a page layout. It is just not very clean.
  3. As with all things HTML and regex:

    Use a proper parser if you must get it right under all circumstances.

How to remove html tags from string in view page C#

If you want to show your content without any formatting then you can use this Regex.Replace(input, "<.*?>", String.Empty) to strip all of Html tags from your string.

1) Add below code to top of view (.cshtml).

@using System.Text.RegularExpressions;

@helper StripHTML(string input)
{
if (!string.IsNullOrEmpty(input))
{
input = Regex.Replace(input, "<.*?>", String.Empty);
<span>@input</span>
}
}

2) Use the above helper function like

<td>@StripHTML(item.Message)</td>

How do I remove all HTML tags from a string without knowing which tags are in it?

You can use a simple regex like this:

public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}

Be aware that this solution has its own flaw. See Remove HTML tags in String for more information (especially the comments of 'Mark E. Haase'/@mehaase)

Another solution would be to use the HTML Agility Pack.

You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?

How can I strip any and all HTML tags from a string?

Try a regex replacement.
This pattern matches html tags within a string. From here

        var pattern = @"</?\w+((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?>";
var source = "<pre> (Refer to business office for guidance and explain below the circumstances for exception to policy or attach a copy of request)</pre>";
Regex.Replace(source, pattern, string.Empty);

How Can I strip HTML from Text in .NET?

I downloaded the HtmlAgilityPack and created this function:

string StripHtml(string html)
{
// create whitespace between html elements, so that words do not run together
html = html.Replace(">","> ");

// parse html
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

// strip html decoded text from html
string text = HttpUtility.HtmlDecode(doc.DocumentNode.InnerText);

// replace all whitespace with a single space and remove leading and trailing whitespace
return Regex.Replace(text, @"\s+", " ").Trim();
}

Remove html tags from a string except a in asp.net

VERY VERY hacky (and really shouldn't be used productionally) but:

C#

Regex.Replace(input, @"<[^>]+?\/?>", m => {
// here you can exclude specific tags such as `<a>` or maybe `<b>`, etc.
return Regex.IsMatch(m.Value, @"^<a\b|\/a>$") ? m.Value : String.Empty;
});

Basically, it just takes out every HTML code with the exception of <a ...>...</a>.

Note: this DOES NOT:

  • Validate if a tag was opened/closed/nested correctly.
  • Validate if the <> are actually HTML tags (maybe your input has < or > in the text itself?)
  • Handle "nested" <> tags. (e.g. <img src="http://placeholde.it/100" alt="foo<Bar>"/> will leave a remainder of "/> in the output string)

Here's the same thing turned in to a helper method:

// Mocks http://www.php.net/strip_tags

/// <summary>
/// Removed all HTML tags from the string and returned the purified result.
/// If supplied, tags matching <paramref name="allowedTags"/> will be left untouched.
/// </summary>
/// <param name="input">The input string.</param>
/// <param name="allowedTags">Tags to remain in the original input.</param>
/// <returns>Transformed input string.</returns>
static String StripTags(String input, params String[] allowedTags)
{
if (String.IsNullOrEmpty(input)) return input;
MatchEvaluator evaluator = m => String.Empty;
if (allowedTags != null && allowedTags.Length > 0)
{
Regex reAllowed = new Regex(String.Format(@"^<(?:{0})\b|\/(?:{0})>$", String.Join("|", allowedTags.Select(x => Regex.Escape(x)).ToArray())));
evaluator = m => reAllowed.IsMatch(m.Value) ? m.Value : String.Empty;
}
return Regex.Replace(input, @"<[^>]+?\/?>", evaluator);
}

// StripTags(input) -- all tags are removed
// StripTags(input, "a") -- all tags but <a> are removed
// StripTags(input, new[]{ "a" }) -- same as above

How to remove html tags from string in c#

Use library such as HTML Agility Pack.
You may use Regular Expressions but I wouldn't recommend it.

EDIT: Here's the sample code that does html to text conversion - http://htmlagilitypack.codeplex.com/SourceControl/changeset/view/66017#1336937



Related Topics



Leave a reply



Submit