How do I remove all HTML tags from a string without knowing which tags are in it?
You can use a simple regex like this:
public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
Be aware that this solution has its own flaw. See Remove HTML tags in String for more information (especially the comments of 'Mark E. Haase'/@mehaase)
Another solution would be to use the HTML Agility Pack.
You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?
How to remove HTML tag (not a specific tag ) with content from a string in javascript
Removing all HTML tags and the innerText can be done with the following snippet. The Regexp captures the opening tag's name, then matches all content between the opening and closing tags, then uses the captured tag name to match the closing tag.
const regexForStripHTML = /<([^</> ]+)[^<>]*?>[^<>]*?<\/\1> */gi;
const text = "OCEP <sup>®</sup> water product";
const stripContent = text.replaceAll(regexForStripHTML, '');
console.log(text);
console.log(stripContent);
How to remove html tags from string in view page C#
If you want to show your content without any formatting then you can use this Regex.Replace(input, "<.*?>", String.Empty)
to strip all of Html tags from your string.
1) Add below code to top of view (.cshtml
).
@using System.Text.RegularExpressions;
@helper StripHTML(string input)
{
if (!string.IsNullOrEmpty(input))
{
input = Regex.Replace(input, "<.*?>", String.Empty);
<span>@input</span>
}
}
2) Use the above helper function like
<td>@StripHTML(item.Message)</td>
How can I remove HTML tags other than div and span from a string in JavaScript?
To remove all tags excluding specific tags, you can use the following regular expression.
const str = "<div><p></p><span></span></div>";
console.log(str.replace(/(<\/?(?:span|div)[^>]*>)|<[^>]+>/ig, '$1'));
Remove HTML tags from a String
Use a HTML parser instead of regex. This is dead simple with Jsoup.
public static String html2text(String html) {
return Jsoup.parse(html).text();
}
Jsoup also supports removing HTML tags against a customizable whitelist, which is very useful if you want to allow only e.g. <b>
, <i>
and <u>
.
See also:
- RegEx match open tags except XHTML self-contained tags
- What are the pros and cons of the leading Java HTML parsers?
- XSS prevention in JSP/Servlet web application
How to remove all html tags from a string
You can strip out all the html-tags with a regular expression: /<(.|\n)*?>/g
Described in detail here: http://www.pagecolumn.com/tool/all_about_html_tags.htm
In your JS-Code it would look like this:
item = item.replace(/<(.|\n)*?>/g, '');
How to remove all html tags including ' ' from string?
The text looks to be double-escaped, kinda - first turn all the &
s into &
s, so that the HTML entities can be properly recognized. Then .text()
will give you the plain text version of the HTML markup.
const input = `<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting,remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>\n\n<p> </p>\n\n<p>TItle </p>\n`;
const inputWithProperEntities = input.replaceAll('&', '&');
console.log($(inputWithProperEntities).text());
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
How can I strip any and all HTML tags from a string?
Try a regex replacement.
This pattern matches html tags within a string. From here
var pattern = @"</?\w+((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?>";
var source = "<pre> (Refer to business office for guidance and explain below the circumstances for exception to policy or attach a copy of request)</pre>";
Regex.Replace(source, pattern, string.Empty);
Remove HTML tags in String
Warning: This does not work for all cases and should not be used to process untrusted user input.
using System.Text.RegularExpressions;
...
const string HTML_TAG_PATTERN = "<.*?>";
static string StripHTML (string inputString)
{
return Regex.Replace
(inputString, HTML_TAG_PATTERN, string.Empty);
}
Related Topics
How to Enable Assembly Bind Failure Logging (Fusion) in .Net
Asynchronously Wait For Task≪T≫ to Complete With Timeout
Encrypting & Decrypting a String in C#
How to Encode and Decode a Base64 String
Parse Datetime With Time Zone of Form Pst/Cest/Utc/Etc
Understanding Events and Event Handlers in C#
Why Is Jsonrequestbehavior Needed
How to Run a Python Script from C#
Implementing Inotifypropertychanged - Does a Better Way Exist
Can't Specify the 'Async' Modifier on the 'Main' Method of a Console App
How to Read and Parse an Xml File in C#
How to Convert a Column Number (E.G. 127) into an Excel Column (E.G. Aa)
How to Suspend Painting For a Control and Its Children
String.Replace (Or Other String Modification) Not Working
What Is an "Index Out of Range" Exception, and How to Fix It