How Many Concurrent Requests Does a Single Flask Process Receive

How many concurrent requests does a single Flask process receive?

When running the development server - which is what you get by running app.run(), you get a single synchronous process, which means at most 1 request is being processed at a time.

By sticking Gunicorn in front of it in its default configuration and simply increasing the number of --workers, what you get is essentially a number of processes (managed by Gunicorn) that each behave like the app.run() development server. 4 workers == 4 concurrent requests. This is because Gunicorn uses its included sync worker type by default.

It is important to note that Gunicorn also includes asynchronous workers, namely eventlet and gevent (and also tornado, but that's best used with the Tornado framework, it seems). By specifying one of these async workers with the --worker-class flag, what you get is Gunicorn managing a number of async processes, each of which managing its own concurrency. These processes don't use threads, but instead coroutines. Basically, within each process, still only 1 thing can be happening at a time (1 thread), but objects can be 'paused' when they are waiting on external processes to finish (think database queries or waiting on network I/O).

This means, if you're using one of Gunicorn's async workers, each worker can handle many more than a single request at a time. Just how many workers is best depends on the nature of your app, its environment, the hardware it runs on, etc. More details can be found on Gunicorn's design page and notes on how gevent works on its intro page.

How to make flask handle 25k request per second like express.js

You can use multithreads or gevent to increase gunicorn's concurrency.

Option1 multithreads

eg:

gunicorn -w 4 --threads 100 -b 0.0.0.0:5000 your_project:app

--threads 100 means 100 threads per process.

-w 4 means 4 processes, so -w 4 --threads 100 means 400 requests at a time

Option2 gevent worker

eg:

pip install gevent
gunicorn -w 4 -k gevent --worker-connections 1000 -b 0.0.0.0:5000 your_project:app

-k gevent --worker-connections 1000 means 1000 coroutines per gevent worker process.

-w 4 means 4 processes, so -w 4 -k gevent --worker-connections 1000 means 4000 requests at a time.

For more information, you can refer to my blog post: https://easydevguide.com/posts/gunicorn_concurrency

how does flask handle simultaneous requests?

Flask doesn't. Parallel request handling is the task of the underlying WSGI web server, which sends the requests to Flask for handling.

Flask's built-in development server which is invoked with Flask.run() runs with threads by default

In production, you'd use one of the WSGI containers or other deployment options, and you control parallelism there. Gunicorn, for example, has the -w command line argument which controls the number of worker processes, and -k which controls how these workers work (processes, threads, or a Tornado event machine, among others).

Flask App can't handle more than 6 requests with a single browser

It's browser's TCP connection is limited to 6 only.

reference

How to solve this

Firefox can be configured from within about:config, filter on network.http for various settings; network.http.max-persistent-connections-per-server is the one to change.

ref

But as it's browser's problem I may use different approach like creating same server with different port.

How to process several HTTP requests with Flask

Let me clear out the confusion for you.

When you are using Flask while developing locally, you use the built-in server which is single-threaded. which means it will only process one request at a time. This is one of the reasons why you shouldn't simply have FLASK_ENV=production and run in a production environment. The built-in server is not capable to run in those environments. One you change FLASK_ENV to production and run, you'll find a warning in the terminal.

Now, coming on to how to run Flask in a production environment, CPU's, Core's, Threads and other stuff

To run Flask in a production environment, you need to have a proper application server that can run your Flask application. Here comes in Gunicorn which is compatible with Flask and one of the most sought after ways of running Flask.

In gunicorn, you have different ways to configure an optimal way to run it based on the specs of your servers.
You can achieve it in the following ways:

Worker Class - The type of worker to use
No of Workers
No of Threads

The way you calculate the maximum number of concurrent requests is as follows:
Taking a 4 core server as

As per the documentation of gunicorn, the optimal number of workers is suggested as (2 * num_of_cores) + 1 which in this case becomes (2*4)+1 = 9

Now, the optimal configuration for the number of threads is 2 to 4 x $(num_of_cores) which in this case comes out to say 4*9 = 36

So now, you have 9 Workers with 36 threads each. Each thread can handle one request at a time so you can have 9*36=324 concurrent connections

Similarly, you can have the calculation for Waitress. I prefer using Gunicorn so you'll need to check out the docs of waitress for the configuration.

Now coming to Web Servers

Until now, what you have configured is an application server to run Flask. This works, but you shouldn't expose an application server directly to the internet. Instead, it's always suggested to deploy Flask behind a reverse proxy like Nginx. Nginx acts as a full-fledged web server capable of handling real-world workloads.

So in a gist, you could use a combination from the list below as per your requirements,

Flask + Application Server + Web Server where,
Application Server is one of Gunicorn, uWSGI, Gevent, Twisted Web, Waitress, etc and a Web Server from one of Nginx, Apache, Traefik, Caddy, etc

How Many Concurrent Requests Does a Single Flask Process Receive