How to Parse HTML in C#

Parsing HTML with c#.net

Give the HTMLAgilityPack a look into. Its a pretty decent HTML parser

http://html-agility-pack.net/?z=codeplex

Here's some code to get you started (requires error checking)

HtmlDocument document = new HtmlDocument(); 
string htmlString = "<html>blabla</html>";
document.LoadHtml(htmlString);
HtmlNodeCollection collection = document.DocumentNode.SelectNodes("//a");
foreach (HtmlNode link in collection)
{
string target = link.Attributes["href"].Value;
}

How to parse HTML nodes

Here is the simplified procedure how to parse html and save it to database. I hope this will help you and/or give you an idea how to solve your problem

        HtmlWeb h = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = h.Load("http://stackoverflow.com/questions/41183837/how-to-store-html-nodes-into-database");
HtmlNodeCollection tableNodes = doc.DocumentNode.SelectNodes("//table");
HtmlNodeCollection h1Nodes = doc.DocumentNode.SelectNodes("//h1");
HtmlNodeCollection pNodes = doc.DocumentNode.SelectNodes("//p");
//get other nodes here

foreach (var pNode in pNodes)
{
string id = pNode.Id;
string content = pNode.InnerText;
string tag = pNode.Name;

//do other stuff here and then save to database

//just an example...
SqlConnection conn = new SqlConnection("here goes conection string");
SqlCommand cmd = new SqlCommand();
cmd.Connection = conn;
cmd.CommandText = "INSERT INTO tblNodeCollection (Tag, Id, Content) VALUES (@tag, @id, @content)";
cmd.Parameters.Add("@tag", tag);
cmd.Parameters.Add("@id", id);
cmd.Parameters.Add("@content", content);

cmd.ExecuteNonQuery();
}


Related Topics



Leave a reply



Submit