Java – Spring webFlux differrences when Netty vs Tomcat is used under the hood


I am learninig spring webflux and I've read the following series of articles(first, second, third)

In the third Article I faced the following text:

Remember the same application code runs on Tomcat, Jetty or Netty.
Currently, the Tomcat and Jetty support is provided on top of Servlet
3.1 asynchronous processing, so it is limited to one request per thread. When the same code runs on the Netty server platform that
constraint is lifted, and the server can dispatch requests
sympathetically to the web client. As long as the client doesn’t
block, everyone is happy. Performance metrics for the netty server and
client probably show similar characteristics, but the Netty server is
not restricted to processing a single request per thread, so it
doesn’t use a large thread pool and we might expect to see some
differences in resource utilization. We will come back to that later
in another article in this series.

First of all I don't see newer article in the series although it was written in 2016. It is clear for me that tomcat has 100 threads by default for handling requests and one thread handle one request in the same time but I don't understand phrase it is limited to one request per thread What does it mean?

Also I would like to know how Netty works for that concrete case(I want to understand difference with Tomcat). Can it handle 2 requests per thread?

Best Solution

Currently there are 2 basic concepts to handle parallel access to a web-server with various advantages and disadvantages:

  1. Blocking
  2. Non-Blocking

Blocking Web-Servers

The first concept of blocking, multi-threaded server has a finite set amount of threads in a pool. Every request will get assigned to specific thread and this thread will be assigned until the request has been fully served. This is basically the same as how a the checkout queues in a super market works, a customer at a time with possible parallel lines. In most circumstances a request in a web server will be cpu-idle for the majority of the time while processing the request. This is due the fact that it has to wait for I/O: read the socket, write to the db (which is also basically IO) and read the result and write to the socket. Additionally using/creating a bunch of threads is slow (context switching) and requires a lot of memory. Therefore this concept often does not use the hardware resources it has very efficiently and has a hard limit on how many clients can be served in parallel. This property is misused in so called starvation attacks, e.g. the slow loris, an attack where usually a single client can DOS a big multi-threaded web-server with little effort.


  • (+) simpler code
  • (-) hard limit of parallel clients
  • (-) requires more memory
  • (-) inefficient use of hardware for usual web-server work
  • (-) easy to DOS

Most "conventional" web server work that way, e.g. older tomcat, Apache Webserver, and everything Servlet older than 3 or 3.1 etc.

Non-Blocking Web-Servers

In contrast a non-blocking web-server can serve multiple clients with only a single thread. That is because it uses the non-blocking kernel I/O features. These are just kernel calls which immediately return and call back when something can be written or read, making the cpu free to do other work instead. Reusing our supermarket metaphor, this would be like, when a cashier needs his supervisor to solve a problem, he does not wait and block the whole lane, but starts to check out the next customer until the supervisor arrives and solves the problem of the first customer.

This is often done in an event loop or higher abstractions as green-threads or fibers. In essence such servers can't really process anything concurrently (of course you can have multiple non-blocking threads), but they are able to serve thousands of clients in parallel because the memory consumption will not scale as drastically as with the multi-thread concept (read: there is no hard limit on max parallel clients). Also there is no thread context-switching. The downside is, that non-blocking code is often more complex to read and write (e.g. callback-hell) and doesn't prefrom well in situations where a request does a lot of cpu-expensive work.


  • (-) more complex code
  • (-) performance worse with cpu intensive tasks
  • (+) uses resources much more efficiently as web server
  • (+) many more parallel clients with no hard-limit (except max memory)

Most modern "fast" web-servers and framework facilitate non-blocking concepts: Netty, Vert.x, Webflux, nginx, servlet 3.1+, Node, Go Webservers.

As a side note, looking at this benchmark page you will see that most of the fastest web-servers are usually non-blocking ones:

See also