How to Throttle Requests in a Web API

How to throttle requests in a Web Api?

You seem to be confusing action filters for an ASP.NET MVC controller and action filters for an ASP.NET Web API controller. Those are 2 completely different classes:

For ASP.NET MVC: System.Web.Mvc.ActionFilterAttribute -> that's what you got from the link
For ASP.NET Web API: System.Web.Http.Filters.ActionFilterAttribute -> that's what you need to implement

It appears that what you have shown is a Web API controller action (one that is declared inside a controller deriving from ApiController). So if you want to apply custom filters to it, they must derive from System.Web.Http.Filters.ActionFilterAttribute.

So let's go ahead and adapt the code for Web API:

public class ThrottleAttribute : ActionFilterAttribute
{
    /// <summary>
    /// A unique name for this Throttle.
    /// </summary>
    /// <remarks>
    /// We'll be inserting a Cache record based on this name and client IP, e.g. "Name-192.168.0.1"
    /// </remarks>
    public string Name { get; set; }

    /// <summary>
    /// The number of seconds clients must wait before executing this decorated route again.
    /// </summary>
    public int Seconds { get; set; }

    /// <summary>
    /// A text message that will be sent to the client upon throttling.  You can include the token {n} to
    /// show this.Seconds in the message, e.g. "Wait {n} seconds before trying again".
    /// </summary>
    public string Message { get; set; }

    public override void OnActionExecuting(HttpActionContext actionContext)
    {
        var key = string.Concat(Name, "-", GetClientIp(actionContext.Request));
        var allowExecute = false;

        if (HttpRuntime.Cache[key] == null)
        {
            HttpRuntime.Cache.Add(key,
                true, // is this the smallest data we can have?
                null, // no dependencies
                DateTime.Now.AddSeconds(Seconds), // absolute expiration
                Cache.NoSlidingExpiration,
                CacheItemPriority.Low,
                null); // no callback

            allowExecute = true;
        }

        if (!allowExecute)
        {
            if (string.IsNullOrEmpty(Message))
            {
                Message = "You may only perform this action every {n} seconds.";
            }

            actionContext.Response = actionContext.Request.CreateResponse(
                HttpStatusCode.Conflict, 
                Message.Replace("{n}", Seconds.ToString())
            );
        }
    }
}

where the GetClientIp method comes from this post.

Now you can use this attribute on your Web API controller action.

Best way to implement request throttling in ASP.NET MVC?

Here's a generic version of what we've been using on Stack Overflow for the past year:

/// <summary>
/// Decorates any MVC route that needs to have client requests limited by time.
/// </summary>
/// <remarks>
/// Uses the current System.Web.Caching.Cache to store each client request to the decorated route.
/// </remarks>
[AttributeUsage(AttributeTargets.Method, AllowMultiple = false)]
public class ThrottleAttribute : ActionFilterAttribute
{
    /// <summary>
    /// A unique name for this Throttle.
    /// </summary>
    /// <remarks>
    /// We'll be inserting a Cache record based on this name and client IP, e.g. "Name-192.168.0.1"
    /// </remarks>
    public string Name { get; set; }

    /// <summary>
    /// The number of seconds clients must wait before executing this decorated route again.
    /// </summary>
    public int Seconds { get; set; }

    /// <summary>
    /// A text message that will be sent to the client upon throttling.  You can include the token {n} to
    /// show this.Seconds in the message, e.g. "Wait {n} seconds before trying again".
    /// </summary>
    public string Message { get; set; }

    public override void OnActionExecuting(ActionExecutingContext c)
    {
        var key = string.Concat(Name, "-", c.HttpContext.Request.UserHostAddress);
        var allowExecute = false;

        if (HttpRuntime.Cache[key] == null)
        {
            HttpRuntime.Cache.Add(key,
                true, // is this the smallest data we can have?
                null, // no dependencies
                DateTime.Now.AddSeconds(Seconds), // absolute expiration
                Cache.NoSlidingExpiration,
                CacheItemPriority.Low,
                null); // no callback

            allowExecute = true;
        }

        if (!allowExecute)
        {
            if (String.IsNullOrEmpty(Message))
                Message = "You may only perform this action every {n} seconds.";

            c.Result = new ContentResult { Content = Message.Replace("{n}", Seconds.ToString()) };
            // see 409 - http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
            c.HttpContext.Response.StatusCode = (int)HttpStatusCode.Conflict;
        }
    }
}

Sample usage:

[Throttle(Name="TestThrottle", Message = "You must wait {n} seconds before accessing this url again.", Seconds = 5)]
public ActionResult TestThrottle()
{
    return Content("TestThrottle executed");
}

The ASP.NET Cache works like a champ here - by using it, you get automatic clean-up of your throttle entries. And with our growing traffic, we're not seeing that this is an issue on the server.

Feel free to give feedback on this method; when we make Stack Overflow better, you get your Ewok fix even faster :)

How to limit concurrent external API calls in .Net Core Web API?

The simplest solution I'm aware of in limiting the number of concurrent access to a piece of code is using a SemaphoreSlim object, in order to implement a throttling mechanism.

You can consider the approach showed below, which you should adapt to your current scenario (the following code is simplistic and it is only meant to show you the general idea):

public class Program 
{
    private static async Task DoSomethingAsync()
    {
      // this is the code for which you want to limit the concurrent execution
    }

    // this is meant to guarantee at most 5 concurrent execution of the code in DoSomethingAsync
    private static readonly SemaphoreSlim _semaphore = new SemaphoreSlim(5); 

    // here we execute 100 calls to DoSomethingAsync, by ensuring that at most 5 calls are executed concurrently
    public static async Task Main(string[] args) 
    {
        var tasks = new List<Task>();
        
        for(int i = 0; i < 100; i++) 
        {
            tasks.Add(ThrottledDoSomethingAsync());
        }
        
        await Task.WhenAll(tasks);
    }

    private static async Task ThrottledDoSomethingAsync()
    {
      await _semaphore.WaitAsync();
      
      try
      {
        await DoSomethingAsync();
      }
      finally
      {
        _semaphore.Release();
      }
    }
}

Here you can find the documentation for the SemaphoreSlim class.

If you want something like a ForEachAsync method, you can consider reading my own question on the subject.

If you are looking for an elegant solution to use a SemaphoreSlim as a throttling mechanism for your service, you can consider defining an interface for the service itself and use the decorator pattern. In the decorator you can implement the throttling logic by using the SemaphoreSlim as showed above, while leaving the service logic simple and untouched in the core implementation of the service. This is not strictly related with your question, it's just a tip to write down the actual implementation for your HTTP service. The core idea of the SemaphoreSlim used as a throttling mechanism is the one showed in the code above.

The bare minimum to adapt your code is which follows:

public sealed class HttpService
{
    // this must be static in order to be shared between different instances
    // this code is based on a max of 25 concurrent requests to the API
    // both GET and POST requests are taken into account (they are globally capped to a maximum of 25 concurrent requests to the API)
    private static readonly SemaphoreSlim _semaphore = new SemaphoreSlim(25);

    public HttpClient GetHttpClient()
    {
        HttpClient client = new HttpClient
        {
            BaseAddress = new Uri(APIServer),
        };
        client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
        client.DefaultRequestHeaders.Add("Authorization", ("Bearer " + Access_Token));
        return client;
    }

    private HttpClient Client;

    public async Task<Object> Get(string apiEndpoint)
    {

        Client = GetHttpClient();
        HttpResponseMessage httpResponseMessage = await this.ExecuteGetRequest(apiEndpoint);
        if (httpResponseMessage.IsSuccessStatusCode)
        {
            Object response = await httpResponseMessage.Content.ReadAsStringAsync();
            return response;
        }
        else if (httpResponseMessage.StatusCode == System.Net.HttpStatusCode.TooManyRequests)
        {
            //need to track failed calls                    
            return StatusCode(httpResponseMessage.StatusCode.GetHashCode());
        }
    }

    private async Task<HttpResponseMessage> ExecuteGetRequest(string url)
    {
        await _semaphore.WaitAsync();

        try
        {
            return await this.Client.GetAsync(url);
        }
        finally
        {
            _semaphore.Release();
        }
    }

    public async Task<Object> Post(string apiEndpoint, Object request)
    {
        Client = GetHttpClient();
        HttpResponseMessage httpResponseMessage = await this.ExecutePostRequest(apiEndpoint, request);
        if (httpResponseMessage.IsSuccessStatusCode)
        {
            return await httpResponseMessage.Content.ReadAsAsync<Object>();
        }

        else if (httpResponseMessage.StatusCode == System.Net.HttpStatusCode.TooManyRequests)
        {
            //need to track
            return StatusCode(httpResponseMessage.StatusCode.GetHashCode());
        }
    }

    private async Task<HttpResponseMessage> ExecutePostRequest(string url, Object request)
    {
        await _semaphore.WaitAsync();

        try
        {
            return await this.Client.PostAsJsonAsync(url, request);
        }
        finally
        {
            _semaphore.Release();
        }
    }
}

IMPORTANT NOTE: the code you posted creates a brand new HttpClient instance each time you need to perform an HTTP request to your API. This is problematic for reasons that go beyond the scope of your question. I strongly suggest you to read this article and this one too.

Block API requests for 5 mins if API rate limit exceeds using WebApiThrottle - C# Web API

For the time being, I'm using this fork of WebApiThrottle and adding the dll manually to the solution. @adamriyadi has implemented this feature in the fork. Waiting for it to come to NuGet package.

Update: Later, I went with my own implementation for API rate limiting with blocking period using HttpRuntime.Cache. So there is no need to add any additional library.

public class ThrottleAttribute : ActionFilterAttribute
    {
        private int _API_RATEQUOTA = 60;

        // per x minute value
        private int _API_TIMELIMIT = 1;

        private int _API_BLOCKDURATION = 5;

        private readonly object syncLock = new object();

        public override void OnActionExecuting(HttpActionContext actionContext)
        {
            // Extract access_token or id or ip address to uniquely identify an API call
            var access_token = AuthHelper.GetAuthToken(actionContext.Request);

            if (access_token != null)
            {

                string throttleBaseKey = GetThrottleBaseKey(access_token);
                string throttleCounterKey = GetThrottleCounterKey(access_token);

                lock (syncLock)
                {
                    //add a listner for new api request count
                    if (HttpRuntime.Cache[throttleBaseKey] == null)
                    {
                        // add api unique key.. this cache will get expire after _API_TIMELIMIT
                        HttpRuntime.Cache.Add(throttleBaseKey,
                            DateTime.UtcNow,
                            null,
                            DateTime.Now.AddMinutes(_API_TIMELIMIT),
                            Cache.NoSlidingExpiration,
                            CacheItemPriority.High,
                            null);

                        // add count as value for that api.. this cache will get expire after _API_TIMELIMIT
                        HttpRuntime.Cache.Add(throttleCounterKey,
                           1,
                           null,
                           DateTime.Now.AddMinutes(_API_TIMELIMIT),
                           Cache.NoSlidingExpiration,
                           CacheItemPriority.High,
                           null);
                    }
                    else
                    {
                        //listener exists for api request count
                        var current_requests = (int)HttpRuntime.Cache[throttleCounterKey];

                        if (current_requests < _API_RATEQUOTA)
                        {
                            // increase api count
                           HttpRuntime.Cache.Insert(throttleCounterKey,
                           current_requests + 1,
                           null,
                           DateTime.Now.AddMinutes(_API_TIMELIMIT),
                           Cache.NoSlidingExpiration,
                           CacheItemPriority.High,
                           null);
                        }

                        //hit rate limit, wait for another 5 minutes (_API_BLOCKDURATION)
                        else
                        {
                            HttpRuntime.Cache.Insert(throttleBaseKey,
                           DateTime.UtcNow,
                           null,
                           DateTime.Now.AddMinutes(_API_BLOCKDURATION),
                           Cache.NoSlidingExpiration,
                           CacheItemPriority.High,
                           null);

                            HttpRuntime.Cache.Insert(throttleCounterKey,
                          current_requests + 1,
                          null,
                          DateTime.Now.AddMinutes(_API_BLOCKDURATION),
                          Cache.NoSlidingExpiration,
                          CacheItemPriority.High,
                          null);

                            Forbidden(actionContext);
                        }
                    }
                }
            }
            else
            {
                BadRequest(actionContext);
            }

            base.OnActionExecuting(actionContext);
        }

        private string GetThrottleBaseKey(string app_id)
        {
            return Identifier.THROTTLE_BASE_IDENTIFIER + app_id;
        }

        private string GetThrottleCounterKey(string app_id)
        {
            return Identifier.THROTTLE_COUNTER_IDENTIFIER + app_id;
        }

        private void BadRequest(HttpActionContext actionContext)
        {
            actionContext.Response = actionContext.Request.CreateResponse(HttpStatusCode.BadRequest);
        }

        private void Forbidden(HttpActionContext actionContext)
        {
            actionContext.Response = actionContext.Request.CreateResponse(HttpStatusCode.Forbidden, "Application Rate Limit Exceeded");
        }

    }

    public static class Identifier
    {
        public static readonly string THROTTLE_BASE_IDENTIFIER = "LA_THROTTLE_BASE_";
        public static readonly string THROTTLE_COUNTER_IDENTIFIER = "LA_THROTTLE_COUNT_";
    }

Now decorate the required API with [ThrottleAttribute]

How to Throttle all outgoing asynchronous calls to HttpClient across multiple threads in .net Core API project

Conceptual questions

SemaphoreSlim is thread-safe so there are no thread-safety or locking concerns about using it as a parallelism throttle across multiple threads.
HttpMessageHandlers are indeed an outbound middleware mechanism to intercept calls placed through HttpClient. So they are an ideal way to apply parallelism-throttling to Http calls using SemaphoreSlim.

Simple implementation

So a ThrottlingDelegatingHandler might look like this:

public class ThrottlingDelegatingHandler : DelegatingHandler
{
    private SemaphoreSlim _throttler;

    public ThrottlingDelegatingHandler(SemaphoreSlim throttler)
    {
        _throttler = throttler ?? throw new ArgumentNullException(nameof(throttler));
    }

    protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        if (request == null) throw new ArgumentNullException(nameof(request));

        await _throttler.WaitAsync(cancellationToken);
        try
        {
            return await base.SendAsync(request, cancellationToken);
        }
        finally
        {
            _throttler.Release();
        }
    }
}

Create and maintain an instance as a singleton:

int maxParallelism = 10;
var throttle = new ThrottlingDelegatingHandler(new SemaphoreSlim(maxParallelism));

Apply that DelegatingHandler to all instances of HttpClient through which you want to parallel-throttle calls:

HttpClient throttledClient = new HttpClient(throttle);

That HttpClient does not need to be a singleton: only the throttle instance does.

I've omitted the Dot Net Core DI code for brevity, but you would register the singleton ThrottlingDelegatingHandler instance with .Net Core's container, obtain that singleton by DI at point-of-use, and use it in HttpClients you construct as shown above.

But:

Better implementation: Using HttpClientFactory (.NET Core 2.1+)

The above still begs the question how you are going to manage HttpClient lifetimes:

Singleton (app-scoped) HttpClients do not pick up DNS updates. Your app will be ignorant of DNS updates unless you kill and restart it (perhaps undesirable).
A frequently-create-and-dispose pattern, using (HttpClient client = ) { }, on the other hand, can cause socket exhaustion.

One of the design goals of HttpClientFactory was to manage the lifecycles of HttpClient instances and their delegating handlers, to avoid these problems.

In .NET Core 2.1, you could use HttpClientFactory to wire it all up in ConfigureServices(IServiceCollection services) in the Startup class, like this:

int maxParallelism = 10;
services.AddSingleton<ThrottlingDelegatingHandler>(new ThrottlingDelegatingHandler(new SemaphoreSlim(maxParallelism)));

services.AddHttpClient("MyThrottledClient")
    .AddHttpMessageHandler<ThrottlingDelegatingHandler>();

("MyThrottledClient" here is a named-client approach just to keep this example short; typed clients avoid string-naming.)

At point-of-use, obtain an IHttpClientFactory by DI (reference), then call

var client = _clientFactory.CreateClient("MyThrottledClient");

to obtain an HttpClient instance pre-configured with the singleton ThrottlingDelegatingHandler.

All calls through an HttpClient instance obtained in this manner will be throttled (in common, across the app) to the originally configured int maxParallelism.

And HttpClientFactory magically deals with all the HttpClient lifetime issues.

Even better implementation: Using Polly with IHttpClientFactory to get all this 'out-of-the-box'

Polly is deeply integrated with IHttpClientFactory and Polly also provides Bulkhead policy which works as a parallelism throttle by an identical SemaphoreSlim mechanism.

So, as an alternative to hand-rolling a ThrottlingDelegatingHandler, you can also just use Polly Bulkhead policy with IHttpClientFactory out of the box. In your Startup class, simply:

int maxParallelism = 10;
var throttler = Policy.BulkheadAsync<HttpResponseMessage>(maxParallelism, Int32.MaxValue);

services.AddHttpClient("MyThrottledClient")
    .AddPolicyHandler(throttler);

Obtain the pre-configured HttpClient instance from HttpClientFactory as earlier. As before, all calls through such a "MyThrottledClient" HttpClient instance will be parallel-throttled to the configured maxParallelism.

The Polly Bulkhead policy additionally offers the ability to configure how many operations you want to allow simultaneously to 'queue' for an execution slot in the main semaphore. So, for instance:

var throttler = Policy.BulkheadAsync<HttpResponseMessage>(10, 100);

when configured as above into an HttpClient, would allow 10 parallel http calls, and up to 100 http calls to 'queue' for an execution slot. This can offer extra resilience for high-throughput systems by preventing a faulting downstream system causing an excessive resource bulge of queuing calls upstream.

To use the Polly options with HttpClientFactory, pull in the Microsoft.Extensions.Http.Polly and Polly nuget packages.

References: Polly deep doco on Polly and IHttpClientFactory; Bulkhead policy.

Addendum re Tasks

The question uses Task.Run(...) and mentions :

a .net core web api that consumes an external api

and:

with tasks being continuously added instead of a pre-defined list of tasks.

If your .net core web api only consumes the external API once per request the .net core web api handles, and you adopt the approaches discussed in the rest of this answer, offloading the downstream external http call to a new Task with Task.Run(...) will be unnecessary and only create overhead in additional Task instances and thread-switching. Dot net core will already be running the incoming requests on multiple threads on the thread pool.

How to Throttle Requests in a Web API