How to cancel Task await after a timeout period
Updated: the latest version of the WebBrowser
-based console web scraper can be found on Github.
Updated: Adding a pool of WebBrowser
objects for multiple parallel downloads.
Do you have an example of how to do this in a console app by any
chance? Also I don't think webBrowser can be a class variable because
I am running the whole thing in a parallell for each, iterating
thousands of URLs
Below is an implementation of more or less generic **WebBrowser
-based web scraper **, which works as console application. It's a consolidation of some of my previous WebBrowser
-related efforts, including the code referenced in the question:
Capturing an image of the web page with opacity
Loading a page with dynamic AJAX content
Creating an STA message loop thread for
WebBrowser
Loading a set of URLs, one after another
Printing a set of URLs with
WebBrowser
Web page UI automation
A few points:
Reusable
MessageLoopApartment
class is used to start and run a WinForms STA thread with its own message pump. It can be used from a console application, as below. This class exposes a TPL Task Scheduler (FromCurrentSynchronizationContext
) and a set ofTask.Factory.StartNew
wrappers to use this task scheduler.This makes
async/await
a great tool for runningWebBrowser
navigation tasks on that separate STA thread. This way, aWebBrowser
object gets created, navigated and destroyed on that thread. Although,MessageLoopApartment
is not tied up toWebBrowser
specifically.It's important to enable HTML5 rendering using Browser Feature
Control, as otherwise theWebBrowser
obejcts runs in IE7 emulation mode by default.
That's whatSetFeatureBrowserEmulation
does below.It may not always be possible to determine when a web page has finished rendering with 100% probability. Some pages are quite complex and use continuous AJAX updates. Yet we
can get quite close, by handlingDocumentCompleted
event first, then polling the page's current HTML snapshot for changes and checking theWebBrowser.IsBusy
property. That's whatNavigateAsync
does below.A time-out logic is present on top of the above, in case the page rendering is never-ending (note
CancellationTokenSource
andCreateLinkedTokenSource
).
using Microsoft.Win32;
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace Console_22239357
{
class Program
{
// by Noseratio - https://stackoverflow.com/a/22262976/1768303
// main logic
static async Task ScrapeSitesAsync(string[] urls, CancellationToken token)
{
using (var apartment = new MessageLoopApartment())
{
// create WebBrowser inside MessageLoopApartment
var webBrowser = apartment.Invoke(() => new WebBrowser());
try
{
foreach (var url in urls)
{
Console.WriteLine("URL:\n" + url);
// cancel in 30s or when the main token is signalled
var navigationCts = CancellationTokenSource.CreateLinkedTokenSource(token);
navigationCts.CancelAfter((int)TimeSpan.FromSeconds(30).TotalMilliseconds);
var navigationToken = navigationCts.Token;
// run the navigation task inside MessageLoopApartment
string html = await apartment.Run(() =>
webBrowser.NavigateAsync(url, navigationToken), navigationToken);
Console.WriteLine("HTML:\n" + html);
}
}
finally
{
// dispose of WebBrowser inside MessageLoopApartment
apartment.Invoke(() => webBrowser.Dispose());
}
}
}
// entry point
static void Main(string[] args)
{
try
{
WebBrowserExt.SetFeatureBrowserEmulation(); // enable HTML5
var cts = new CancellationTokenSource((int)TimeSpan.FromMinutes(3).TotalMilliseconds);
var task = ScrapeSitesAsync(
new[] { "http://example.com", "http://example.org", "http://example.net" },
cts.Token);
task.Wait();
Console.WriteLine("Press Enter to exit...");
Console.ReadLine();
}
catch (Exception ex)
{
while (ex is AggregateException && ex.InnerException != null)
ex = ex.InnerException;
Console.WriteLine(ex.Message);
Environment.Exit(-1);
}
}
}
/// <summary>
/// WebBrowserExt - WebBrowser extensions
/// by Noseratio - https://stackoverflow.com/a/22262976/1768303
/// </summary>
public static class WebBrowserExt
{
const int POLL_DELAY = 500;
// navigate and download
public static async Task<string> NavigateAsync(this WebBrowser webBrowser, string url, CancellationToken token)
{
// navigate and await DocumentCompleted
var tcs = new TaskCompletionSource<bool>();
WebBrowserDocumentCompletedEventHandler handler = (s, arg) =>
tcs.TrySetResult(true);
using (token.Register(() => tcs.TrySetCanceled(), useSynchronizationContext: true))
{
webBrowser.DocumentCompleted += handler;
try
{
webBrowser.Navigate(url);
await tcs.Task; // wait for DocumentCompleted
}
finally
{
webBrowser.DocumentCompleted -= handler;
}
}
// get the root element
var documentElement = webBrowser.Document.GetElementsByTagName("html")[0];
// poll the current HTML for changes asynchronosly
var html = documentElement.OuterHtml;
while (true)
{
// wait asynchronously, this will throw if cancellation requested
await Task.Delay(POLL_DELAY, token);
// continue polling if the WebBrowser is still busy
if (webBrowser.IsBusy)
continue;
var htmlNow = documentElement.OuterHtml;
if (html == htmlNow)
break; // no changes detected, end the poll loop
html = htmlNow;
}
// consider the page fully rendered
token.ThrowIfCancellationRequested();
return html;
}
// enable HTML5 (assuming we're running IE10+)
// more info: https://stackoverflow.com/a/18333982/1768303
public static void SetFeatureBrowserEmulation()
{
if (System.ComponentModel.LicenseManager.UsageMode != System.ComponentModel.LicenseUsageMode.Runtime)
return;
var appName = System.IO.Path.GetFileName(System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName);
Registry.SetValue(@"HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Main\FeatureControl\FEATURE_BROWSER_EMULATION",
appName, 10000, RegistryValueKind.DWord);
}
}
/// <summary>
/// MessageLoopApartment
/// STA thread with message pump for serial execution of tasks
/// by Noseratio - https://stackoverflow.com/a/22262976/1768303
/// </summary>
public class MessageLoopApartment : IDisposable
{
Thread _thread; // the STA thread
TaskScheduler _taskScheduler; // the STA thread's task scheduler
public TaskScheduler TaskScheduler { get { return _taskScheduler; } }
/// <summary>MessageLoopApartment constructor</summary>
public MessageLoopApartment()
{
var tcs = new TaskCompletionSource<TaskScheduler>();
// start an STA thread and gets a task scheduler
_thread = new Thread(startArg =>
{
EventHandler idleHandler = null;
idleHandler = (s, e) =>
{
// handle Application.Idle just once
Application.Idle -= idleHandler;
// return the task scheduler
tcs.SetResult(TaskScheduler.FromCurrentSynchronizationContext());
};
// handle Application.Idle just once
// to make sure we're inside the message loop
// and SynchronizationContext has been correctly installed
Application.Idle += idleHandler;
Application.Run();
});
_thread.SetApartmentState(ApartmentState.STA);
_thread.IsBackground = true;
_thread.Start();
_taskScheduler = tcs.Task.Result;
}
/// <summary>shutdown the STA thread</summary>
public void Dispose()
{
if (_taskScheduler != null)
{
var taskScheduler = _taskScheduler;
_taskScheduler = null;
// execute Application.ExitThread() on the STA thread
Task.Factory.StartNew(
() => Application.ExitThread(),
CancellationToken.None,
TaskCreationOptions.None,
taskScheduler).Wait();
_thread.Join();
_thread = null;
}
}
/// <summary>Task.Factory.StartNew wrappers</summary>
public void Invoke(Action action)
{
Task.Factory.StartNew(action,
CancellationToken.None, TaskCreationOptions.None, _taskScheduler).Wait();
}
public TResult Invoke<TResult>(Func<TResult> action)
{
return Task.Factory.StartNew(action,
CancellationToken.None, TaskCreationOptions.None, _taskScheduler).Result;
}
public Task Run(Action action, CancellationToken token)
{
return Task.Factory.StartNew(action, token, TaskCreationOptions.None, _taskScheduler);
}
public Task<TResult> Run<TResult>(Func<TResult> action, CancellationToken token)
{
return Task.Factory.StartNew(action, token, TaskCreationOptions.None, _taskScheduler);
}
public Task Run(Func<Task> action, CancellationToken token)
{
return Task.Factory.StartNew(action, token, TaskCreationOptions.None, _taskScheduler).Unwrap();
}
public Task<TResult> Run<TResult>(Func<Task<TResult>> action, CancellationToken token)
{
return Task.Factory.StartNew(action, token, TaskCreationOptions.None, _taskScheduler).Unwrap();
}
}
}
How to cancel async Task after a period of time
Cancellation is cooperative. You just need to pass CancellationToken
into your StartRotation
:
public async static Task InitAds(CancellationToken token)
{
Debug.WriteLine("API: Loading Ad images");
await Task.WhenAll(ads.Select(l => l.Value).Where(l=>l!=null).Select(l => l.StartRotation(token)));
}
And then call it as such:
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
await InitAds(cts.Token);
How to set timeout for a task, and then abort it
If you want to abort the task after 3s you need to send the token to the function. If you use Task.Delay
and send in the token that will throw an exception on cancellation and abort the task.
class Program
{
static async Task Main(string[] args)
{
Console.WriteLine($"The main thread is {Thread.CurrentThread.ManagedThreadId}");
var cts = new CancellationTokenSource();
Person p = new Person { Name = "Apple" };
try
{
cts.CancelAfter(TimeSpan.FromSeconds(3));//limited to 3 seconds
await DoSth(p, cts.Token);
}
catch (Exception e)
{
Console.WriteLine(e.Message); //task was canceled
}
Console.WriteLine(cts.Token.IsCancellationRequested);
await Task.Delay(3000);
Console.ReadLine();
}
static async Task DoSth(Person p, CancellationToken ct)
{
p.Name = "Cat";
await Task.Delay(5000, ct); //Will throw on cancellation, so next row will not run if cancelled after 3s.
Console.WriteLine($"The async thread is {Thread.CurrentThread.ManagedThreadId}");
}
}
public class Person
{
public string Name { get; set; }
}
How can I cancel an asynchronous task after a given time and how can I restart a failed task?
set timeout in your logic to suspend the task:
int timeout = 1000;
var task = SomeOperationAsync();
if (await Task.WhenAny(task, Task.Delay(timeout)) == task) {
// task completed within timeout
} else {
// timeout logic
}
Asynchronously wait for Task<T> to complete with timeout
and also put try catch blocks in a while loop with a flag until you want to retry
var retry=0;
while (retry<=3)
{
try{
await with timeout
raise timeout exception
}
catch(catch timeout exception here )
{
retry++;
if(retry ==3)
{
throw the catched exception here
}
}
}
Asynchronously wait for TaskT to complete with timeout
How about this:
int timeout = 1000;
var task = SomeOperationAsync();
if (await Task.WhenAny(task, Task.Delay(timeout)) == task) {
// task completed within timeout
} else {
// timeout logic
}
And here's a great blog post "Crafting a Task.TimeoutAfter Method" (from MS Parallel Library team) with more info on this sort of thing.
Addition: at the request of a comment on my answer, here is an expanded solution that includes cancellation handling. Note that passing cancellation to the task and the timer means that there are multiple ways cancellation can be experienced in your code, and you should be sure to test for and be confident you properly handle all of them. Don't leave to chance various combinations and hope your computer does the right thing at runtime.
int timeout = 1000;
var task = SomeOperationAsync(cancellationToken);
if (await Task.WhenAny(task, Task.Delay(timeout, cancellationToken)) == task)
{
// Task completed within timeout.
// Consider that the task may have faulted or been canceled.
// We re-await the task so that any exceptions/cancellation is rethrown.
await task;
}
else
{
// timeout/cancellation logic
}
How to cancel an async WebApi action after timeout?
I made a version in LINQPad with the 'C# Program' selection - it compiles and runs with output of 2 lines, showing both the time-out and success cases:
Timeout of 00:00:05 expired
Successfully got result of foo
Here's the snippet:
void Main()
{
CallGetStringWithTimeout(TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(10)).Wait();
CallGetStringWithTimeout(TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(0)).Wait();
}
public async Task CallGetStringWithTimeout(TimeSpan callTimeout, TimeSpan callAddedDelay)
{
var myTask = GetStringAsync(callAddedDelay);
await Task.WhenAny(Task.Delay(callTimeout), myTask);
if (myTask.Status == TaskStatus.RanToCompletion)
{
Console.WriteLine ("Successfully got result of {0}", await myTask);
}
else
{
Console.WriteLine ("Timeout of {0} expired", callTimeout);
}
}
public async Task<string> GetStringAsync(TimeSpan addedDelay)
{
await Task.Delay(addedDelay);
return "foo";
}
However, the 'normal' way is using CancellationTokenSource and specifying your timeout as the ctor param. If you already have a CancellationTokenSource, you can call the CancelAfter method on it, which will schedule the cancellation for the specified timeout.
C#: Async Task: Cancel
I don't quite understand the purpose of the code. Particulary the if-statement:
if (await Task.WhenAny(_t1, Task.Delay(_timeout, timeoutCancellationTokenSource.Token)) == _t1)
Why pass in a _timeout
to Task.Delay
which is the same timeout as you created the timeoutCancellationTokenSource
with?
If we ignore the first parameter you give Task.Delay
it also means you have two calls to Task.Delay
with the same CancellationToken
and now you have a race-condition where it's impossible to predict which Task.Delay
will be cancelled first.
To show an example of how you can get the cancellation to work, you can do like this:
TimeSpan _timeout = TimeSpan.FromSeconds(1);
var timeoutCancellationTokenSource = new System.Threading.CancellationTokenSource(_timeout);
Task _t1 = asyncFunction("LINE 01", " ", " ", " ", 2000, timeoutCancellationTokenSource);
try
{
await _t1;
}
catch (OperationCanceledException)
{
Console.WriteLine("Operation was cancelled");
}
// Output: Operation was cancelled
If _timeout
is longer than the 2000
milliseconds passed into asyncFunction
the task completes after 2 seconds without being cancelled.
See this fiddle for a test run.
Related Topics
Entity Framework Code First - Two Foreign Keys from Same Table
Download Excel File Via Ajax MVC
Cannot Convert from List≪Derivedclass≫ to List≪Baseclass≫
How to Escape Braces (Curly Brackets) in a Format String in .Net
Nullable Types and the Ternary Operator: Why Is '? 10: Null' Forbidden
Using New Unity Videoplayer and Videoclip API to Play Video
String.Replace (Or Other String Modification) Not Working
Does C# Support Return Type Covariance
Using Xpath With Default Namespace in C#
What Is an "Index Out of Range" Exception, and How to Fix It
Convert Integer to Hexadecimal and Back Again
Json.Net: How to Deserialize Without Using the Default Constructor
Memory Leak Using Streamreader and Xmlserializer
What Is a Nullreferenceexception, and How to Fix It
Using Cookiecontainer With Webclient Class
How to Handle Both a Single Item and an Array For the Same Property Using Json.Net