How do I remove all HTML tags from a string without knowing which tags are in it?
You can use a simple regex like this:
public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
Be aware that this solution has its own flaw. See Remove HTML tags in String for more information (especially the comments of 'Mark E. Haase'/@mehaase)
Another solution would be to use the HTML Agility Pack.
You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?
How to clean HTML tags using C#
HTML Agility Pack:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
string s = doc.DocumentNode.SelectSingleNode("//body").InnerText;
How to remove html tags from string in view page C#
If you want to show your content without any formatting then you can use this Regex.Replace(input, "<.*?>", String.Empty)
to strip all of Html tags from your string.
1) Add below code to top of view (.cshtml
).
@using System.Text.RegularExpressions;
@helper StripHTML(string input)
{
if (!string.IsNullOrEmpty(input))
{
input = Regex.Replace(input, "<.*?>", String.Empty);
<span>@input</span>
}
}
2) Use the above helper function like
<td>@StripHTML(item.Message)</td>
How to remove start and end html tag using C#?
Use Regex
var item = "<p>Some text</p><p>More text</p>";
item = Regex.Replace(item,@"^<[^>^<.]*>","");
item = Regex.Replace(item,@"<[^>^<.]*>$","");
Console.WriteLine(item) //Will log Some text</p><p>More text
Regex Breakdown:
^
: matches start of string
<
: opening tag
>
: closing tag
[^>^<.]*
: exclude closing and opening tags inside tag and match any character except the excluded ones as often as possible
Do the same again just this time we match the end of the string with $
at the end of the expression
Remove HTML tags from string including   in C#
If you can't use an HTML parser oriented solution to filter out the tags, here's a simple regex for it.
string noHTML = Regex.Replace(inputHTML, @"<[^>]+>| ", "").Trim();
You should ideally make another pass through a regex filter that takes care of multiple spaces as
string noHTMLNormalised = Regex.Replace(noHTML, @"\s{2,}", " ");
Remove Certain HTML tags in C#
As far as I can see, you want to remove the HTML elements that contain a style
attribute, also remove their closing pairs. Unfortunately, there is no good way to do that with regexes. Without the 'also remove their closing pairs' clause, we could write an approximately good regex.
On the other hand, XSLT
is the right tool for this, because it can handle the recursive nature of XML:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="//*[not(@style)]">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
What's happening here? The <xsl:template match="//*[not(@style)]">
part matches everything that does not have a style
attribute. Then the <xsl:copy>...</xsl:copy>
part copies them entirely. I.e. the items that have a style
attribute, they will not be copied.
For the record, this is a slight variant of the XSLT identity transformation:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Try to removing empty html tags in C#
You can just check for Value
. Value
will also be empty when there are child nodes (that are empty). Also, you are checking for attributes and not removing nodes that have attributes, but from your example you want to remove empty tags with attributes.
string src = @"
<html><body>
<div class=""sfd"">test</div>
<p dir = ""rtl"" style=""margin-bottom: 0;margin-left: 0;margin-right: 0;margin-top: 0;""><span style = ""font-size: 11pt;font-style: normal;font-weight: normal;margin: 0;padding: 0;"" > </span ></p >
<p dir=""rtl"" style=""font-family: David;font-size: 11pt;line-height: 115.0%;margin-top: 0;""><span style = ""font-size: 11pt;font-style: normal;font-weight: normal;margin: 0;padding: 0;"" > </span ></p >
<div class=""sfd"">test</div>
<p dir = ""rtl"" style=""font-family: David;font-size: 11pt;line-height: 115.0%;margin-bottom: 0;margin-left: 0;margin-right: 0;margin-top: 0;""><span style = ""font-size: 11pt;font-style: normal;font-weight: normal;margin: 0;padding: 0;"" > </span ></p >
<p dir=""rtl"" style=""font-family: David;font-size: 11pt;line-height: 115.0%;margin-bottom: 0;margin-left: 0;margin-right: 0;margin-top: 0;""><span style = ""font-size: 11pt;font-style: normal;font-weight: normal;margin: 0;padding: 0;"" > </span ></p >
<div class=""sfd"">test</div>
<p dir = ""rtl"" style=""font-family: David;font-size: 11pt;line-height: 115.0%;margin-right: 0;margin-top: 0;""><span style = ""font-size: 11pt;font-style: normal;font-weight: normal;margin: 0;padding: 0;"" > </span ></p >
</body></html>
";
XDocument xDoc = XDocument.Parse(src);
xDoc.Descendants().Where(node => string.IsNullOrWhiteSpace(node.Value)).Remove();
MessageBox.Show(xDoc.ToString());
To keep <br/>
, just exclude it explicitely. Replace in above code:
xDoc.Descendants().Where(node => string.IsNullOrWhiteSpace(node.Value) && node.Name != "br").Remove();
Related Topics
When Do We Need to Set Processstartinfo.Useshellexecute to True
How to Read Connection String in .Net Core
Difference Between Casting and Using the Convert.To() Method
Implicit VS Explicit Interface Implementation
Soap Client in .Net - References or Examples
Set Custom Path to Referenced Dll'S
How to Parse a String into a Nullable Int
Anonymous Method in Invoke Call
Validation: How to Inject a Model State Wrapper with Ninject
How to Upload Files in ASP.NET Core
Merging Dlls into a Single .Exe with Wpf
Converting Svg to Png Using C#
How to Get the CPU Temperature
.Net Code to Send Zpl to Zebra Printers