Getting HTML Body Content in Winforms Webbrowser After Body Onload Event Executes

Getting the final form the HTML body with WebBrowser?

The problem was that on the DocumentCompleted event, the data is completely received, but onLoad events are not tripped. The solution is to wait until AFTER the DocumentCompleted event has fired, and THEN get the innerHTML.

Webbrowser behaviour issues

I could recommend two things:

  • Don't execute your code upon DocumentComplete event, rather do upon DOM window.onload event.
  • To make sure your web page behaves in WebBrowser control the same way as it would in full Internet Explorer browser, consider implementing Feature Control.

[EDITED] There's one more suggestion, based on the structure of your code. Apparently, you perform a series of navigation/handle DocumentComplete actions. It might be more natural and easy to use async/await for this. Here's an example of doing this, with or without async/await. It illustrates how to handle onload, too:

async Task DoNavigationAsync()
{
bool documentComplete = false;
TaskCompletionSource<bool> onloadTcs = null;

WebBrowserDocumentCompletedEventHandler handler = delegate
{
if (documentComplete)
return; // attach to onload only once per each Document
documentComplete = true;

// now subscribe to DOM onload event
this.wb.Document.Window.AttachEventHandler("onload", delegate
{
// each navigation has its own TaskCompletionSource
if (onloadTcs.Task.IsCompleted)
return; // this should not be happening

// signal the completion of the page loading
onloadTcs.SetResult(true);
});
};

// register DocumentCompleted handler
this.wb.DocumentCompleted += handler;

// Navigate to http://www.example.com?i=1
documentComplete = false;
onloadTcs = new TaskCompletionSource<bool>();
this.wb.Navigate("http://www.example.com?i=1");
await onloadTcs.Task;
// the document has been fully loaded, you can access DOM here
MessageBox.Show(this.wb.Document.Url.ToString());

// Navigate to http://example.com?i=2
// could do the click() simulation instead

documentComplete = false;
onloadTcs = new TaskCompletionSource<bool>(); // new task for new navigation
this.wb.Navigate("http://example.com?i=2");
await onloadTcs.Task;
// the document has been fully loaded, you can access DOM here
MessageBox.Show(this.wb.Document.Url.ToString());

// no more navigation, de-register DocumentCompleted handler
this.wb.DocumentCompleted -= handler;
}

Here's the same code without async/await pattern (for .NET 4.0):

Task DoNavigationAsync()
{
// save the correct continuation context for Task.ContinueWith
var continueContext = TaskScheduler.FromCurrentSynchronizationContext();

bool documentComplete = false;
TaskCompletionSource<bool> onloadTcs = null;

WebBrowserDocumentCompletedEventHandler handler = delegate
{
if (documentComplete)
return; // attach to onload only once per each Document
documentComplete = true;

// now subscribe to DOM onload event
this.wb.Document.Window.AttachEventHandler("onload", delegate
{
// each navigation has its own TaskCompletionSource
if (onloadTcs.Task.IsCompleted)
return; // this should not be happening

// signal the completion of the page loading
onloadTcs.SetResult(true);
});
};

// register DocumentCompleted handler
this.wb.DocumentCompleted += handler;

// Navigate to http://www.example.com?i=1
documentComplete = false;
onloadTcs = new TaskCompletionSource<bool>();
this.wb.Navigate("http://www.example.com?i=1");

return onloadTcs.Task.ContinueWith(delegate
{
// the document has been fully loaded, you can access DOM here
MessageBox.Show(this.wb.Document.Url.ToString());

// Navigate to http://example.com?i=2
// could do the 'click()' simulation instead

documentComplete = false;
onloadTcs = new TaskCompletionSource<bool>(); // new task for new navigation
this.wb.Navigate("http://example.com?i=2");

onloadTcs.Task.ContinueWith(delegate
{
// the document has been fully loaded, you can access DOM here
MessageBox.Show(this.wb.Document.Url.ToString());

// no more navigation, de-register DocumentCompleted handler
this.wb.DocumentCompleted -= handler;
}, continueContext);

}, continueContext);
}

Note, it both cases it is still a piece of asynchronous code which returns a Task object. Here's an example of how to handle the completion of such task:

private void Form1_Load(object sender, EventArgs e)
{
DoNavigationAsync().ContinueWith(_ => {
MessageBox.Show("Navigation complete!");
}, TaskScheduler.FromCurrentSynchronizationContext());
}

The benefit of using TAP pattern here is that DoNavigationAsync is a self-contained, independent method. It can be reused and it doesn't interfere with the state of parent object (in this case, the main form).

Changing the HTML in a WebBrowser before it is displayed to the user?

You could do the DOM manipulation inside the Navigated event:

webBrowser1.Navigated += (sender, e) =>
{
((WebBrowser)sender).Document.GetElementById("Text").InnerHtml = "Bye";
};

This will execute before any DOM ready handlers in the document. So for example if you had the following HTML initially:

<html>
<head>
<title>Test</title>
</head>
<body onload="document.getElementById('Text').innerHTML = document.getElementById('Text').innerHTML + ' modified';">
<p id="Text">Hello</p>
</body>
</html>

When you display this code in the WebBrowser you will get Bye modified.



Related Topics



Leave a reply



Submit