Parsing HTML with c#.net
Give the HTMLAgilityPack a look into. Its a pretty decent HTML parser
http://html-agility-pack.net/?z=codeplex
Here's some code to get you started (requires error checking)
HtmlDocument document = new HtmlDocument();
string htmlString = "<html>blabla</html>";
document.LoadHtml(htmlString);
HtmlNodeCollection collection = document.DocumentNode.SelectNodes("//a");
foreach (HtmlNode link in collection)
{
string target = link.Attributes["href"].Value;
}
C# parsing HTML for general use?
I use the mshtml api.
simply refer to the mshtml assembly then include the namespace.
from there you can declare a HTMLDocument object which is queryable, its a bit of headache in places because the API design forces you to do random casting but it does get the job done and it can always be put in to a util class on it's own so you don't have to keep your oddities in your main app code classes.
Parsing HTML content with C# Parser
Here you go with a quick and dirty approach:
class RoomInfo
{
public String Name { get; set; }
public Dictionary<String, Double> Prices { get; set; }
}
private static void HtmlFile()
{
List<RoomInfo> rooms = new List<RoomInfo>();
HtmlDocument document = new HtmlDocument();
document.Load("file.txt");
var h2Nodes = document.DocumentNode.SelectNodes("//h2");
foreach (var h2Node in h2Nodes)
{
RoomInfo roomInfo = new RoomInfo
{
Name = h2Node.InnerText.Trim(),
Prices = new Dictionary<string, double>()
};
var labels = h2Node.NextSibling.NextSibling.SelectNodes(".//label");
foreach (var label in labels)
{
roomInfo.Prices.Add(label.InnerText.Trim(), Convert.ToDouble(label.Attributes["precio"].Value, CultureInfo.InvariantCulture));
}
rooms.Add(roomInfo);
}
}
The rest is up to you! ;-)
Does .NET framework offer methods to parse an HTML string?
HtmlDocument
GetElementById
HtmlElement
You can create a dummy html document.
WebBrowser w = new WebBrowser();
w.Navigate(String.Empty);
HtmlDocument doc = w.Document;
doc.Write("<html><head></head><body><img id=\"myImage\" src=\"c:\"/><a id=\"myLink\" href=\"myUrl\"/></body></html>");
Console.WriteLine(doc.Body.Children.Count);
Console.WriteLine(doc.GetElementById("myImage").GetAttribute("src"));
Console.WriteLine(doc.GetElementById("myLink").GetAttribute("href"));
Console.ReadKey();
Output:
2
file:///c:
about:myUrl
Editing elements:
HtmlElement imageElement = doc.GetElementById("myImage");
string newSource = "d:";
imageElement.OuterHtml = imageElement.OuterHtml.Replace(
"src=\"c:\"",
"src=\"" + newSource + "\"");
Console.WriteLine(doc.GetElementById("myImage").GetAttribute("src"));
Output:
file:///d:
Parsing HTML to get content using C#
It isn't 100% clear what you want, but I'm assuming you want the text minus markup; so:
string html;
// obtain some arbitrary html....
using (var client = new WebClient()) {
html = client.DownloadString("http://stackoverflow.com/questions/2038104");
}
// use the html agility pack: http://www.codeplex.com/htmlagilitypack
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
StringBuilder sb = new StringBuilder();
foreach (HtmlTextNode node in doc.DocumentNode.SelectNodes("//text()")) {
sb.AppendLine(node.Text);
}
string final = sb.ToString();
Parsing HTML String
You can use the excellent HTML Agility Pack.
This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
Related Topics
How to Detect Click/Touch Events on Ui and Gameobjects
Is There a Constraint That Restricts My Generic Method to Numeric Types
How to Use Linq to Select Object With Minimum or Maximum Property Value
Generating All Possible Combinations
How to Convert Json Object to Custom C# Object
How to Start a Process from C#
Filesystemwatcher Changed Event Is Raised Twice
Sending Email in .Net Through Gmail
C# Variance Problem: Assigning List≪Derived≫ as List≪Base≫
Why Is Floating Point Arithmetic in C# Imprecise
How to Provide User Name and Password When Connecting to a Network Share
Simple Insecure Two-Way Data "Obfuscation"
Why Is the Console Window Closing Immediately Once Displayed My Output