Grab all text from html with Html Agility Pack
var root = doc.DocumentNode;
var sb = new StringBuilder();
foreach (var node in root.DescendantNodesAndSelf())
{
if (!node.HasChildNodes)
{
string text = node.InnerText;
if (!string.IsNullOrEmpty(text))
sb.AppendLine(text.Trim());
}
}
This does what you need, but I am not sure if this is the best way. Maybe you should iterate through something other than DescendantNodesAndSelf for optimal performance.
HTML Agility Pack - Grab Text after a node
You can use XPath following-sibling::text()[1]
to get text node located directly after each strong
. Here is a minimal but complete example :
var raw = @"<div>
<strong>Title</strong>: Mr<br>
<strong>First name</strong>: Fake<br>
<strong>Surname</strong>: Guy<br>
</div>";
var doc = new HtmlDocument();
doc.LoadHtml(raw);
foreach(HtmlNode node in doc.DocumentNode.SelectNodes("//strong"))
{
var val = node.SelectSingleNode("following-sibling::text()[1]");
Console.WriteLine(node.InnerText + ", " + val.InnerText);
}
dotnetfiddle demo
output :
Title, : Mr
First name, : Fake
Surname, : Guy
You should be able to remove the ":" by doing simple string manipulation, if needed...
Using Html Agility Pack to select all paragraphs that start with a certain text value
You can get the first 5 paragraphs where the inner text starts with "Version" like this:
var nodesParagraph = nodeRevHist
.Elements("p")
.Where(p => p.InnerText.Trim().StartsWith("Version"))
.Take(5);
Working demo here: https://dotnetfiddle.net/uvwcUN
Get href tag inner text from html (html agility pack)
You're effectively just collecting the inner text of the nodes. Do this:
var texts = doc.DocumentNode
.SelectNodes("//a[@href]")
.Select(n => n.InnerText)
.Distinct()
.ToList();
c# htmlagilitypack - how to extract specific text from web page
You need to loop through all siblings between first .heading-size-3
, till next header .heading-size-3
HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();
html.LoadHtml(new WebClient().DownloadString("http://www.wowhead.com/quest=35151"));
var root = html.DocumentNode;
var descriptionHeader = root.Descendants("h2")
.Where(n => n.GetAttributeValue("class", "")
.Equals("heading-size-3"))
.FirstOrDefault();
var current = descriptionHeader.NextSibling;
var result = "";
while(current != null && !current.GetAttributeValue("class", "")
.Equals("heading-size-3"))
{
if (!string.IsNullOrEmpty(current.InnerText))
{
result += " "+current.InnerText;
}
current = current.NextSibling;
}
richTextBox1.Text = result;
At the end, you will receive:
You have already constructed an impressive garrison in Frostfire. I believe I should defer this next choice to you.
One region of Gorgrond is rich in resources. A lumber mill could help us make the most of them.
Another region harbors hardened gladiators. A sparring arena would help persuade them to fight for our cause.
Either path will strengthen us as we seek to find and weaken the Iron Horde.
Which do you choose, Commander?
HtmlAgilityPack how to extract html between some tag
You can use OuterHtml
property of each <p>
element to get the desired HTML :
string s = "<p>firt paragraph</p>some <br />text<p>another paragraph</p><span>some text between span</span><p>hellow word</p>";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(s);
var nodes = doc.DocumentNode.SelectNodes("//p");
foreach (var item in nodes)
{
Console.WriteLine(item.OuterHtml);
}
output :
<p>firt paragraph</p>
<p>another paragraph</p>
<p>hellow word</p>
Or if you mean to get everything between the first <p>
and the last <p>
elements, inclusive, you can use the following XPath :
var query = "//node()[preceding-sibling::p or self::p][following-sibling::p or self::p]";
The XPath grab all nodes (either element or text node) that: has preceding sibling p
and following sibling p
, or the node itself is a p
element.
var nodes = doc.DocumentNode.SelectNodes(query);
foreach (var item in nodes)
{
Console.WriteLine(item.OuterHtml);
}
output :
<p>firt paragraph</p>
some
<br />
text
<p>another paragraph</p>
<span>some text between span</span>
<p>hellow word</p>
Related Topics
Query Microsoft Access Mdb Database Using Linq and C#
Is There Windows System Event on Active Window Changed
Wpf: How to Loop Through the All Controls in a Window
Is Nameof() Evaluated at Compile-Time
C# Implementation of Deep/Recursive Object Comparison in .Net 3.5
Formatting a Double to Two Decimal Places
How to Set Value for Property of an Anonymous Object
Calling Oracle Stored Procedure from C#
An Extension Method on Ienumerable Needed for Shuffling
How to Remove Accents on a String
How to Compare Only Date Components from Datetime in Ef
How to Copy a File to Another Path
When Should I Use HTML.Displayfor in MVC
Importing Nested Namespaces Automatically in C#
Create a Coroutine to Fade Out Different Types of Object
In Wpf Can You Filter a Collectionviewsource Without Code Behind
Why Does Stylecop Recommend Prefixing Method or Property Calls with "This"