Running Scripts in HTMLagilitypack

Running Scripts in HtmlAgilityPack

You are getting what the server is returning - the same as a web browser. A web browser, of course, then runs the scripts. Html Agility Pack is an HTML parser only - it has no way to interpret the javascript or bind it to its internal representation of the document. If you wanted to run the script you would need a web browser. The perfect answer to your problem would be a complete "headless" web browser. That is something that incorporates an HTML parser, a javascript interpreter, and a model that simulates the browser DOM, all working together. Basically, that's a web browser, except without the rendering part of it. At this time there isn't such a thing that works entirely within the .NET environment.

Your best bet is to use a WebBrowser control and actually load and run the page in Internet Explorer under programmatic control. This won't be fast or pretty, but it will do what you need to do.

Also see my answer to a similar question: Load a DOM and Execute javascript, server side, with .Net which discusses the available technology in .NET to do this. Most of the pieces exist right now but just aren't quite there yet or haven't been integrated in the right way, unfortunately.

Calling javascript function from HtmlAgilityPack

If you trust the source, it looks to me like you'd be better off invoking the WebBrowser control. HtmlAgilityPack does not provide a scripting engine.

how to get javascript code too with the actual source with Html Agility Pack

To expand on @alecxe answer, you can use Selenium* to load your target page like a real browser would do, and then pass the result to HtmlAgilityPack for further processing :

using OpenQA.Selenium;

.....

IWebDriver driver = new PhantomJS.PhantomJSDriver();
driver.Navigate().GoToUrl(url);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(driver.PageSource);

alternatively, you can just run your query (XPath or CSS selector) using Selenium directly, for example :

var result = driver.FindElements(By.XPath("your query"));

//print HTML of the returned elements
foreach (var item in result)
{
Console.WriteLine(item.GetAttribute("outerHTML"));
}

*) Need to download Selenium first, as well as the driver i.e PhantomJS, Firefox, etc. Selenium can be installed to your project easily from NuGet.

htmlagilitypack - remove script and style?

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

doc.DocumentNode.Descendants()
.Where(n => n.Name == "script" || n.Name == "style")
.ToList()
.ForEach(n => n.Remove());


Related Topics



Leave a reply



Submit