C# Version of HTML Tidy

C# version of HTML Tidy?

The latest C# wrapper for HTML Tidy was done by Mark Beaton, which seems rather more up-to-date than the links you've referenced (2003). Also worth of note is that Mark provides executables for referencing as well, rather than pulling them from the official site. That should do the trick of nicely organising and validating your HTML.

  • TidyManaged (source)
  • TidyManaged/libtidy builds

How to clean HTML tags using C#

HTML Agility Pack:

    HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
string s = doc.DocumentNode.SelectSingleNode("//body").InnerText;

Tidy equivalent of Ruby #map and #join for C#

Looks like LINQ Select() is what you're looking for :

var result = String.Join(", ", 
node.SelectNodes(".//div[@class='email-to']//span")
.Select(o => o.GetAttributeValue("title",""))
);

Format HTML through C#

This is what you need.

var input = "<div dir=\"ltr\"><div class=\"gmail_quote\"><strong><span style=\"font-family:"Arial","sans-serif"\">CA IQVIA EM Event Speaker info</span></strong></div></div>";

try
{
var formatted = System.Xml.Linq.XElement.Parse(input).ToString();
}
catch
{
// Your input is not a valid xml fragment.
}

How to parse bad html?

Just use Html Agility Pack. It is the very good to parse faulty html code

Using HTML Tidy in Visual C++ 2010 Windows Forms project

It's been almost 48 hours struggling with this problem. Solution discovered! Here it is...

Using the very simple .NET wrapper from here http://www.codeproject.com/KB/cs/ZetaHtmlTidy.aspx converted the VC project to VC++ 2010 ok and compiled as a DLL ok. Below is the code I used to call it:

System::String^ TidyMyHTML(String^ MyHTMLString)
{
using namespace ZetaHtmlTidy;
HtmlTidy tidy;
String^ s = tidy.CleanHtml( MyHTMLString, HtmlTidyOptions::ConvertToXhtml );
return s;
}

Hopefully this post will spare someone else from going through the same thing.

EDIT:

Taking this a step further I was able to convert the VC++ 2008 project files from the tidy source attached to the wrapper and upgrade them to VC++ 2010 project files. I was then able to compile the tidy project (separate from his wrapper class project) into libtidy.lib static libraries (both release and debug). I was then able to incorporate his wrapper class into my application and point to the include and lib files. The end result was exactly what I wanted, a solution that incorporates tidy into my application without needing to have a dll dependency. This whole experience has accelerated my learning curve for attaching C libraries to my C++ applications.

Thanks for the suggestions, and I hope someone finds this post useful.



Related Topics



Leave a reply



Submit