0

So at work we have a few systems running on their own hardware that is identical to all of the systems. If one of the hardware fails, one of the running system will take over and start the system that failed, doing the work of both. All Linux and Intel cpu based.

How would that be possible to have a OS start another OS to run together? It is not just running a program, it is running the entire thing as well.

I have heard of things like how you can have windows let another OS use one of the cores on the cpu for parallel computing. But I am curious to how a system would immediately take over another system. It was super cool and my co workers unfortunately don't understand how it is done either.

2
  • 2
    you say these systems are "All Linux" ... then talk about windows for no reason - what is the issue? You and your co-workers don't understand concepts like high availability clusters? Who maintains them! Commented Jan 7, 2024 at 7:31
  • 1
    "It is not just running a program, it is running the entire thing as well." How do you know it's not just running a program? What is the "thing" you refer to? And what exactly does it "take over"? Does it take over some service (like http, for instance)? Because it doesn't necessarily need to run the service under it's dedicated OS. Another machine could take over the service under it's own OS. Also the service might run inside a container, which makes it even easier to migrate. So as the answer states, you can run "OS under OS" using VMs, but it's not necessarily what happens in your case. Commented Jan 7, 2024 at 12:59

1 Answer 1

0

How would that be possible to have a OS start another OS to run together? It is not just running a program, it is running the entire thing as well.

The thing you're describing is called a virtual machine. X86 processors of the last 15 years have been heavily optimized to run such virtualized systems at little performance reduction, so that it is the default in any data center that the machine you're administrating is booting your OS, but isn't actually your hardware, but just your VM.

So, this is all very mature technology!

Virtualization capabilities are part of the Linux Kernel (look up what kvm is), and for actually running virtual machines there are numerous frontends to do that. There's philosophical differences, and sizes of solutions that you might or might not need - from a KVM-supported VM running in a manually stated Qemu, over running such VMs using a more programmable manner using libvirt/virsh/virtmamager, to not thinking of your machine with a regular main Linux, but all the work always being done in VMs (more the way that Xen works), to managing racks full of servers that do nothing but run multiple VMs which can be shifted between machines while they're running (that's where you typically find things like proxmox, and VMware'sESXI).

There's really not one size that fits all, or depends on what you need:

If one of the hardware fails, one of the running system will take over

The easy part is running VMs on multiple machines, as ready-to-jump-in reserves. The harder part, and the one that will depend on what these machines are actually doing, application-wise, is figuring out how the so-called failover to these will happen.

For something like a static files http server, that might be as easy as simply automatically changing an entry in the reverse proxy, or assigning the IP address of the failed system to the backup system.

For something more complex, with transactions that might change data in a database, you might need to continuously replicate all database changes that the active server does, to be ready when things fall over.

For other things, like redundant storage cluster access servers, you might actually need actual specialized hardware (like hard drive controllers that can be accessed from two different computers at the same time). This is very much the exception.

In general, to be able to notice when something failed, you need a mechanism to know that - and this again varies a lot by the application space.

So, seeing that happened at your workplace: ask the IT guys that set up the system, if you want to know how your system specifically works.

1
  • term missing from this otherwise excellent answer: "hot failover" Commented Jan 7, 2024 at 17:23

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.