Make Several Requests to an API That Can Only Handle 20 Request a Minute

Make several requests to an API that can only handle 20 request a minute

You could send 1 block of 20 requests every minute or space them out 1 request every 3 seconds (latter probably preferred by the API owners).

function rateLimitedRequests(array, chunkSize) {
  var delay = 3000 * chunkSize;
  var remaining = array.length;
  var promises = [];
  var addPromises = function(newPromises) {
    Array.prototype.push.apply(promises, newPromises);
    if (remaining -= newPromises.length == 0) {
      Promise.all(promises).then((data) => {
        ... // do your thing
      });
    }
  };
  (function request() {
    addPromises(array.splice(0, chunkSize).map(apiFetch));
    if (array.length) {
      setTimeout(request, delay);
    }
  })();
}

To call 1 every 3 seconds:

rateLimitedRequests(bigArray, 1);

Or 20 every minute:

rateLimitedRequests(bigArray, 20);

If you prefer to use _.chunk and ~~_.debounce~~¹ _.throttle:

function rateLimitedRequests(array, chunkSize) {
  var delay = 3000 * chunkSize;
  var remaining = array.length;
  var promises = [];
  var addPromises = function(newPromises) {
    Array.prototype.push.apply(promises, newPromises);
    if (remaining -= newPromises.length == 0) {
      Promise.all(promises).then((data) => {
        ... // do your thing
      });
    }
  };
  var chunks = _.chunk(array, chunkSize);  
  var throttledFn = _.throttle(function() {
    addPromises(chunks.pop().map(apiFetch));
  }, delay, {leading: true});
  for (var i = 0; i < chunks.length; i++) {
    throttledFn();
  }
}

¹You probably want _.throttle since it executes each function call after a delay whereas _.debounce groups multiple calls into one call. See this article linked from the docs

Debounce: Think of it as "grouping multiple events in one". Imagine that you go home, enter in the elevator, doors are closing... and suddenly your neighbor appears in the hall and tries to jump on the elevator. Be polite! and open the doors for him: you are debouncing the elevator departure. Consider that the same situation can happen again with a third person, and so on... probably delaying the departure several minutes.
Throttle: Think of it as a valve, it regulates the flow of the executions. We can determine the maximum number of times a function can be called in certain time. So in the elevator analogy.. you are polite enough to let people in for 10 secs, but once that delay passes, you must go!

Make multiple get request but with a minute limit of 10 request/minute NodeJS

You can do by implementing Promise.all method.

For example:

const [ response1, response2, response3 ] = await Promise.all([
axios.get(URL_HERE), axios.get(URL_HERE), axios.get(URL_HERE) ];
console.log(response1);
console.log(response3);
console.log(response2);

For Scenario 2 which we discuss below in comment section

function rateLimiter(array = [], chunkSize = 10 ) {

    const delay = 10 * 1000;

    return new Promise((resolve, reject) => {
        let results = [];
        const callback = (iteration = 0) => {
            const process = array.slice(iteration * chunkSize, chunkSize);
            if(!process.length) return resolve(results);

            Promise.all(process).then(responses => {
                results = [ ...results, ...responses ];
                const processLength = array.slice((iteration + 1) * chunkSize, chunkSize).length;
                
                if(!processLength) return resolve(results);
                setTimeout(() => {
                    callback(iteration + 1);
                }, delay)
            })
        }

        callback();

    })
}

let results = await rateLimiter([ axios.get('URL_HERE'), axios.get('URL_HERE') ], 20);

Postman: How to make multiple requests at the same time

I guess there's no such feature in postman as to run concurrent tests.

If I were you, I would consider Apache jMeter, which is used exactly for such scenarios.

Regarding Postman, the only thing that could more or less meet your needs is - Postman Runner.
Sample Image
There you can specify the details:

number of iterations,
upload CSV file with data for different test runs, etc.

The runs won't be concurrent, only consecutive.

Hope that helps. But do consider jMeter (you'll love it).

Multiple Api requests in nodejs gives Timedout error

Whatever the target is that you're sending these 4000 requests to at once apparently cannot handle that many requests all in flight at the same time and still respond to each request in a timely fashion. This is not surprising.

The usual solution to something like this would be to limit how many simultaneous requests you send to the target to something smaller like 10 or 20 (usually determined with testing). You send 10 requests, then each time a request finishes, you send the next one in line. This way, you never over tax the target server.

You can hand-code code to distribute requests as described above or you can use a pre-built function to do that like Bluebirds Promise.map() or a function like mapConcurrent() and you can see a discussion of this general issue here.

Is there standard way of making multiple API calls combined into one HTTP request?

TL;DR

REST nor HTTP are ideal for batch operations.
Usually caching, which is one of RESTs constraints, which is not optional but mandatory, prevents batch processing in some form.
It might be beneficial to not expose the data to update or remove in batch as own resources but as data elements within a single resource, like a data table in a HTML page. Here updating or removing all or parts of the entries should be straight forward.
If the system in general is write-intensive it is probably better to think of other solutions such as exposing the DB directly to those clients to spare a further level of indirection and complexity.
Utilization of caching may prevent a lot of workload on the server and even spare unnecessary connecctions

To start with, REST nor HTTP are ideal for batch operations. As Jim Webber pointed out the application domain of HTTP is the transfer of documents over the Web. This is what HTTP does and this is what it is good at. However, any business rules we conclude are just a side effect of the document management and we have to come up with solutions to turn this document management side effects to something useful.

As REST is just a generalization of the concepts used in the browsable Web, it is no miracle that the same concepts that apply to Web development also apply to REST development in some form. Thereby a question like how something should be done in REST usually resolves around answering how something should be done on the Web.

As mentioned before, HTTP isn't ideal in terms of batch processing actions. Sure, a GET request may retrieve multiple results, though in reality you obtain one response containing links to further resources. The creation of resources has, according to the HTTP specification, to be indicated with a Location header that points to the newly created resource. POST is defined as an all purpose method that allows to perform tasks according to server-specific semantics. So you could basically use it to create multiple resources at once. However, the HTTP spec clearly lacks support for indicating the creation of multiple resources at once as the Location header may only appear once per response as well as define only one URI in it. So how can a server indicate the creation of multiple resources to the server?

A further indication that HTTP isn't ideal for batch processing is that a URI must reference a single resource. That resource may change over time, though the URI can't ever point to multiple resources at once. The URI itself is, more or less, used as key by caches which store a cacheable response representation for that URI. As a URI may only ever reference one single resource, a cache will also only ever store the representation of one resource for that URI. A cache will invalidate a stored representation for a URI if an unsafe operation is performed on that URI. In case of a DELETE operation, which is by nature unsafe, the representation for the URI the DELETE is performed on will be removed. If you now "redirect" the DELETE operation to remove multiple backing resources at once, how should a cache take notice of that? It only operates on the URI invoked. Hence even when you delete multiple resources in one go via DELETE a cache might still serve clients with outdated information as it simply didn't take notice of the removal yet and its freshness value would still indicate a fresh-enough state. Unless you disable caching by default, which somehow violates one of REST's constraints, or reduce the time period a representation is considered fresh enough to a very low value, clients will probably get served with outdated information. You could of course perform an unsafe operation on each of these URIs then to "clear" the cache, though in that case you could have invoked the DELETE operation on each resource you wanted to batch delete itself to start with.

It gets a bit easier though if the batch of data you want to remove is not explicitly captured via their own resources but as data of a single resource. Think of a data-table on a Web page where you have certain form-elements, such as a checkbox you can click on to mark an entry as delete candidate and then after invoking the submit button send the respective selected elements to the server which performs the removal of these items. Here only the state of one resource is updated and thus a simple POST, PUT or even PATCH operation can be performed on that resource URI. This also goes well with caching as outlined before as only one resource has to be altered, which through the usage of unsafe operations on that URI will automatically lead to an invalidation of any stored representation for the given URI.

The above mentioned usage of form-elements to mark certain elements for removal depends however on the media-type issued. In the case of HTML its forms section specifies the available components and their affordances. An affordance is the knowledge what you can and should do with certain objects. I.e. a button or link may want to be pushed, a text field may expect numeric or alphanumeric input which further may be length limited and so on. Other media types, such as hal-forms, halform or ion, attempt to provide form representations and components for a JSON based notation, however, support for such media-types is still quite limited.

As one of your concerns are the number of client connections to your service, I assume you have a write-intensive scenario as in read-intensive cases caching would probably take away a good chunk of load from your server. I.e. BBC once reported that they could reduce the load on their servers drastically just by introducing a one minute caching interval for recently requested resources. This mainly affected their start page and the linked articles as people clicked on the latest news more often than on old news. On receiving a couple of thousands, if not hundred thousands, request per minute they could, as mentioned before, reduce the number of requests actually reaching the server significantly and therefore take away a huge load on their servers.

Write intensive use-cases however can't take benefit of caching as much as read-intensive cases as the cache would get invalidated quite often and the actual request being forward to the server for processing. If the API is more or less used to perform CRUD operations, as so many "REST" APIs do in reality, it is questionable if it wouldn't be preferable to expose the database directly to the clients. Almost all modern database vendors ship with sophisticated user-right management options and allow to create views that can be exposed to certain users. The "REST API" on top of it basically just adds a further level of indirection and complexity in such a case. By exposing the DB directly, performing batch updates or deletions shouldn't be an issue at all as through the respective query languages support for such operations should already be build into the DB layer.

In regards to the number of connections clients create: HTTP from 1.0 on allows the reusage of connections via the Connection: keep-alive header directive. In HTTP/1.1 persistent connections are used by default if not explicitly requested to close via the respective Connection: close header directive. HTTP/2 introduced full-duplex connections that allow many channels and therefore requests to reuse the same connections at the same time. This is more or less a fix for the connection limitation suggested in RFC 2626 which plenty of Web developers avoided by using CDN and similar stuff. Currently most implementations use a maximum limit of 100 channels and therefore simultaneous downloads via a single connections AFAIK.

Usually opening and closing a connection takes a bit of time and server resources and the more open connections a server has to deal with the more a system may suffer. Though open connections with hardly any traffic aren't a big issue for most servers. While the connection creation was usually considered to be the costly part, through the usage of persistent connections that factor moved now towards the number of requests issued, hence the request for sending out batch-requests, which HTTP is not really made for. Again, as mentioned throughout the post, through the smart utilization of caching plenty of requests may never reach the server at all, if possible. This is probably one of the best optimization strategies to reduce the number of simultaneous requests, as probably plenty of requests might never reach the server at all. Probably the best advice to give is in such a case to have a look at what kind of resources are requested frequently, which requests take up a lot of processing capacity and which ones can easily get responded with by utilizing caching options.

Fastest way to make a million POST requests to a cloud function?

A lot of this has been covered before in prior questions/answers, but none that I found is a pure duplicate of what you're asking so I'll reference some that have come before and add some explanation. First the ones that have come before:

How to make millions of parallel http requests from nodejs app

How to fire off 1,000,000 requests

Is there a limit to how many promises can or should run concurrently when making requests

In Node js. How many simultaneous requests can I send with the "request" package

What is the limit of sending concurrent ajax requests with node.js?

How to loop many http requests with axios in node.js

Handling large number of outbound HTTP requests

Promise.all consumes all my RAM

Properly batch nested promises in Node

How can I handle a file of 30,000 urls without memory leaks?

First off, you can send a lot of parallel outbound requests. You do not have to wait for a prior response before sending the next one.

Second, you have resource limits on both client and server and ultimately, you will have to explore with testing your local configuration and your target server to find out where those resource limits are and then write your code to stay within those limits. There is no way to reliably send a request and then immediately kill the socket because you don't care about the response. If your socket gets queued by the target server (because you've already overwhelmed it), then killing the socket may drop it from the target server's queue before it gets processed by the target server.

Your local configuration will be limited by how many simultaneous sockets you can have open and how much memory you have (as each outbound request takes some amount of memory to keep track of).

The target server will be limited by its own resources. It may have protections built-in to limit how many posts/sec it can received from one particular source (rate limiting). It may have overall server protections against how many incoming requests at once it can handle. Typically servers protect themselves from overload by configuring things so that once an incoming request queue gets to a certain level, they just immediately hang up on new requests. The idea is to provide some level of protection of service and just deflect new requests when they come in too fast.

If this isn't your target server and there isn't any documentation about what its limits are supposed to be, then you will just have to test how many simutaneous requests you can have "in-flight" at the same time. If they implement rate limiting from a given source, then it's not uncommon that this might be a fairly low number such as 5. If no rate limiting, then you're really just trying to figure out what their http server can handle without causing it to drop connections in defense of service.

Once you figure out (with testing) how many simultaneous requests in flight the target server can comfortably handle, you will have to structure your code to deliver that. Usually, you would take an approach like is show in this mapConcurrent() function where you code things so that only N requests are in flight at the same time where N is a number you figured out experimentally by testing the target server.

Relevant pieces of helper code:

mapConcurrent(array, maxConcurrent, fn)

rateLimitMap(array, requestsPerSec, maxInFlight, fn)

runN(fn, limit, cnt, options)

pMap(array, fn, limit)

And, if you want a pre-made library, the async library contains a bunch of control flow helpers like these.

Data structure to manage a maximum number of requests per minute for an api

If understand your problem correctly, you can use a BlockingQueue with a ScheduledExecutorService as follows.

BlockingQueues have the method put which will only add the given element at the queue if there is available space, otherwise the method call will wait (until there is free space). They also have the method take which will only remove an element from the queue if there are any elements at all, otherwise the method call will wait (until there is at least one element to take).

Specifically you can use a LinkedBlockingQueue or an ArrayBlockingQueue which can be given with a fixed size of elements to hold at any given time. This fixed size means that you can submit with put as many requests as you like, but you will only take requests and process them once every second or something (so as to make 60 requests per minute for example).

To instantiate a LinkedBlockingQueue with fixed size, just use the corresponding constructor (which accepts the size as the argument). LinkedBlockingQueue will take elements in FIFO order according to its documentation.

To instantiate an ArrayBlockingQueue with fixed size, use the constructor which accepts the size but also the boolean flag named fair. If this flag is true then the queue will take elements also in FIFO order.

Then you can have a ScheduledExecutorService (instead of waiting inside a loop) where you can submit a single Runnable which will take from the queue, make the communication with the external API and then wait for the required delay between communications.

Follows a simple demonstration example of the above:

import java.util.Objects;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class Main {
    
    public static class RequestSubmitter implements Runnable {
        private final BlockingQueue<Request> q;
        
        public RequestSubmitter(final BlockingQueue<Request> q) {
            this.q = Objects.requireNonNull(q);
        }
        
        @Override
        public void run() {
            try {
                q.put(new Request()); //Will block until available capacity.
            }
            catch (final InterruptedException ix) {
                System.err.println("Interrupted!"); //Not expected to happen under normal use.
            }
        }
    }
    
    public static class Request {
        public void make() {
            try {
                //Let's simulate the communication with the external API:
                TimeUnit.MILLISECONDS.sleep((long) (Math.random() * 100));
            }
            catch (final InterruptedException ix) {
                //Let's say here we failed to communicate with the external API...
            }
        }
    }
    
    public static class RequestImplementor implements Runnable {
        private final BlockingQueue<Request> q;
        
        public RequestImplementor(final BlockingQueue<Request> q) {
            this.q = Objects.requireNonNull(q);
        }
        
        @Override
        public void run() {
            try {
                q.take().make(); //Will block until there is at least one element to take.
                System.out.println("Request made.");
            }
            catch (final InterruptedException ix) {
                //Here the 'taking' from the 'q' is interrupted.
            }
        }
    }
    
    public static void main(final String[] args) throws InterruptedException {
        
        /*The following initialization parameters specify that we
        can communicate with the external API 60 times per 1 minute.*/
        final int maxRequestsPerTime = 60;
        final TimeUnit timeUnit = TimeUnit.MINUTES;
        final long timeAmount = 1;
        
        final BlockingQueue<Request> q = new ArrayBlockingQueue<>(maxRequestsPerTime, true);
        //final BlockingQueue<Request> q = new LinkedBlockingQueue<>(maxRequestsPerTime);
        
        //Submit some RequestSubmitters to the pool...
        final ExecutorService pool = Executors.newFixedThreadPool(100);
        for (int i = 0; i < 50_000; ++i)
            pool.submit(new RequestSubmitter(q));
        
        System.out.println("Serving...");
        
        //Find out the period between communications with the external API:
        final long delayMicroseconds = TimeUnit.MICROSECONDS.convert(timeAmount, timeUnit) / maxRequestsPerTime;
        //We could do the same with NANOSECONDS for more accuracy, but that would be overkill I think.
        
        //The most important line probably:
        Executors.newSingleThreadScheduledExecutor().scheduleWithFixedDelay(new RequestImplementor(q), 0L, delayMicroseconds, TimeUnit.MICROSECONDS);
    }
}

Note that I used scheduleWithFixedDelay and not scheduleAtFixedRate. You can see in their documentation that the first one will wait for the delay between the end of the call of the submitted Runnable to start the next one, while the second one will not wait and just resubmit the Runnable every period time units. But we don't know how long does it take to communicate with the external API, so what if for example we scheduleAtFixedRate with a period of once every minute, but the request takes more than a minute to be completed?... Then a new request would be submitted while the first one is not yet finished. So that is why I used scheduleWithFixedDelay instead of scheduleAtFixedRate. But there is more: I used a single thread scheduled executor service. Does that mean that if the first call is not finished, then a second cannot be started?... Well it seems, if you take a look at the implementation of Executors#newSingleThreadScheduledExecutor(), that a second call may occur because single thread core pool size, does not mean that the pool is of fixed size.

Another reason that I used scheduleWithFixedDelay is because of underflow of requests. For example what about the queue being empty? Then the scheduling should also wait and not submit the Runnable again.

On the other hand, if we use scheduleWithFixedDelay, with say a delay of 1/60f seconds between scheduling, and there are submitted more than 60 requests in a single minute, then this will surely make our throughput to the external API drop, because with scheduleWithFixedDelay we can guarantee that at most 60 requests will be made to the external API. It can be less than that, but we don't want it to be. We would like to reach the limit every single time. If that's not a concern to you, then you can use the above implementation already.

But let's say you do care to reach as close to the limit as possible every time, in which case and as far as I know, you can do this with a custom scheduler, which would be less clean solution than the first, but more time accurate.

Bottomline, with the above implementation, you need to make sure that the communication with the external API to serve the requests is as fast as possible.

Finaly, I should warn you to consider that I couldn't find what happens if the BlockingQueue implementations I suggested are not puting in FIFO order. I mean, what if 2 requests arrive at almost the same time while the queue is full? They will both wait, but will the first one which arrived be waiting and get puted first, or the second one be puted first? I don't know. If you don't care about some requests being made at the external API out of order, then don't worry and use the code up to this point. If you do care however, and you are able to put for example a serial number at each request, then you can use a PriorityQueue after the BlockingQueue, or even experiment with PriorityBlockingQueue (which is unfortunately unbounded). That would complicate things even more, so I didn't post relevant code with the PriorityQueue. At least I did my best and I hope I shed some good ideas. I am not saying this post is a complete solution to all your problems, but it is some considerations to start with.

How to loop many http requests with axios in node.js

If you're just trying to get all the results faster, you can request them in parallel and know when they are all done with Promise.all():

async function getAllusers() {
  let allUsersData = await usersDao.getAllusers();

  await Promise.all(allUsersData.map((userData, index) => {
      let body = new URLSearchParams({ip: userData.ip});
      return axios.post("http://myAPI", body).then((res) => {
          allUsersData[index].countryCode = res.data.countryCode;
      });
  }));
  return allUsersData;
}

Note, I would not recommend doing it this way if the allUsersData array is large (like more than 20 long) because you'll be raining a lot of requests on the target server and it may either impeded its performance or you may get rate limited or even refused service. In that case, you'd need to send N requests at a time (like perhaps 5) using code like this pMap() here or mapConcurrent() here.

How to post multiple Axios requests at the same time?

There are three cases via you can achieve your goal.

For simultaneous requests with Axios, you can use Axios.all()

 axios.all([
   axios.post(`/my-url`, {
     myVar: 'myValue'
   }), 
   axios.post(`/my-url2`, {
     myVar: 'myValue'
   })
 ])
 .then(axios.spread((data1, data2) => {
   // output of req.
   console.log('data1', data1, 'data2', data2)
 }));

you can use Promise.allSettled(). The Promise.allSettled() method returns a promise that resolves after all of the given promises have either resolved or rejected,
You can try to use Promise.all() but it has the drawback that if any 1 req failed then it will fail for all and give o/p as an error(or in catch block)

but the best case is the first one.

Make Several Requests to an API That Can Only Handle 20 Request a Minute