TLDR: How can my load balancer efficiently and transparently forward an incoming connection to a container/VM in Linux?
Problem: For educational purposes, and maybe to write a patch for liburing in case some APIs are missing, I would like to learn how to implement a load balancer capable of scaling a target service from zero to hero. LB and target services are on the same physical node.
I would like for this approach to be:
- Efficient: as little memory copying as possible, as little CPU utilization as possible
- Transparent: the target service should not understand what's happening
I saw systemd socket activation, but it seems it can scale from 0 to 1, while it does not handle further scaling. Also the socket hands off code felt a bit hard to follow, but maybe I'm just a noob.
Current status: After playing a bit I managed to do this either efficiently or transparently, but not both. I would like to do both.
The load balancer process is written in Rust and uses io_uring.
Efficient approach:
- LB binds to a socket and fires a multishot accept
- On client connection the LB perform some business logic to decide which container should handle the incoming request
- If the service is scaled to zero fires up the first container
- If the service is overloaded fires up more instances
- Pass the socket file descriptor to the container via
sendmsg - The container receives the FD and fires a multishot receive to handle incoming data
This approach is VERY efficient (no memory copying, very little CPU usage) but the receiving process need to be aware of what's happening to receive and correctly handle the socket FD.
Let's say I want to run an arbitrary node.js container, then this approach won't work.
Transparent approach:
- LB binds to a socket and fires a multishot accept
- On client connection the LB perform some business logic to decide which container should handle the incoming request
- If the service is scaled to zero fires up the first container
- If the service is overloaded fires up more instances
- LB connect to the container, fires a multishot receive
- Incoming data get sent to the container via zerocopy send
This approach is less efficient because:
- The incoming container copies the data once (but this happens also in the efficient case)
- We double the number of active connections, for each connection between client and LB we have a connection between LB and service
The advantage of this approach is that the incoming service is not aware of what's happening
Questions:
- What can I use to efficiently forward the connection from the LB to the container? Some kind of pipe?
- Is there a way to make the container think there is a new accept event even though the connection was already accepted and without opening a new connection between the LB and the container?
- If the connection is TCP, can I use the fact that both the LB and the container are on the same phyisical node and use some kind of lightweight protocol? For example I could use Unix Domain Sockets but then the target app should be aware of this, breaking transparency