How to Get Rendered HTML (Processed by JavaScript) in Webbrowser Control

How to get rendered html (processed by Javascript) in WebBrowser control?

Here is one solution I found to get to the rendered HTML(DOM) after javascript was run:

Place a WebBrowser control named webBrowser1 on the Form of class Form1.

[Form1.cs[Design]]

Sample Image

Then for code use:

[Form1.cs]

using System;
using System.Runtime.InteropServices;
using System.Windows.Forms;

namespace WebBrowserTest
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
this.webBrowser1.ObjectForScripting = new MyScript();
}

private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.Navigate("http://localhost:6489/Default.aspx");
}

private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
webBrowser1.Navigate("javascript: window.external.CallServerSideCode();");
}

[ComVisible(true)]
public class MyScript
{
public void CallServerSideCode()
{
var doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
}
}
}
}

Change the webBrowser1.Navigate("http://localhost:6489/Default.aspx") parameter in Form1_Load to the page whose DOM after being processed by javascript you wish to obtain.

You can access the modified DOM in the CallServerSideCode() method, for example:

doc.GetElementById("myDataTable");

Or you can access the rendered HTML like this:

var renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;

How to get HTML from WebBrowser control

Your samples refer to the WinForms-WebBrowserControl.
Add a reference to Microsoft.mshtml (via add-reference dialog->search) to your project.

Cast the Document-Property to

HTMLDocument

in order to access methods and properties (as stated on MSDN).

See also my GitHub-Sample:

private void WebBrowser_Navigated(object sender, NavigationEventArgs e) {
var document = (HTMLDocument)_Browser.Document;
_Html.Text = document.body.outerHTML;
}

Get dynamically generated (rendered) HTMl from IE

Im using internet explorer 9 but the process should be the same or very similar for ie 11:

  • Navigate to the webpage
  • Press F12 to launch developer tools (Also available from the tools menu)
  • Right click the opening HTML tag
  • Select copy outerHTML

You should now have all the dynamic HTML in your clipboard to paste where you like

Get HTML Source after JavaScript manipulations

The trick is going to be finding a way to notify the control about whether the JS is done running. You might be able to do that by having the JS set a form element' value (isJSComplete) when it has completed and polling with the web browser control.

Use the following code to check a form value to see if it is ready

MyBrowserControl.document.getElementById('isJSComplete');

Use the following code to pull the HTML from the page.

MyBrowserControl.Document.documentElement.OuterHTML

Better yet, here is an article showing how to wire up JS events to be handled by the WebBrowser control. You could just fire an event when the JS is done and have your code trap that event and then pull the HTML using the above approach.

Get HTML from Frame using WebBrowser control - unauthorizedaccessexception

Thanks to the Noseratio's comments I managed to do that with the WebBrowser control. Here are some major points that might help others who have similar questions:

1) DocumentCompleted event should be used. For Navigated event body of the document is NULL.

2) Following answer helped a lot: WebBrowserControl: UnauthorizedAccessException when accessing property of a Frame

3) I was not aware about IHTMLWindow2 similar interfaces, for them to work correctly I added references to following COM libs: Microsoft Internet Controls (SHDocVw), Microsoft HTML Object Library (MSHTML).

4) I grabbed the html of the frame with the following code:

    void WebBrowserMain_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (e.Url.OriginalString == Constants.FINAL_URL)
{
try
{
var doc = (IHTMLDocument2) WebBrowserMain.Document.DomDocument;
var frame = (IHTMLWindow2) doc.frames.item(0);
var document = CrossFrameIE.GetDocumentFromWindow(frame);
var html = document.body.outerHTML;

var dataParser = new DataParser(html);
//my logic here
}

5) For the work with Html, I used the fine HTML Agility Pack that has some pretty good XPath search.



Related Topics



Leave a reply



Submit