When a binary file runs, does it copy its entire binary data into memory at once? Could I change that?

Question

Does it copy the entire binary to the memory before it executes? I am interested in this question and want to change it into some other way. I mean, if the binary is 100M big (seems impossible), I could run it while I am copying it into the memory. Could that be possible?

Or could you tell me how to see the way it runs? Which tools do I need?

100MB binary impossible? I have here a 6MB binary compiled from an only 500 LOC, heavily templated and Boosted C++ program. I bet I could bump that to 100MB without too much trouble. — Fred Foo
– Fred Foo, Commented Dec 14, 2011 at 15:21
Nothing is impossible! Just get in touch with a merry Eclipse developer, they will teach you all the tricks. — Lundin
– Lundin, Commented Dec 14, 2011 at 15:51

BRPocock · Accepted Answer · 2011-12-14 15:32:56Z

42

The theoretical model for an application-level programmer makes it appear that this is so. In point of fact, the normal startup process (at least in Linux 1.x, I believe 2.x and 3.x are optimized but similar) is:

The kernel creates a process context (more-or-less, virtual machine)
Into that process context, it defines a virtual memory mapping that maps from RAM addresses to the start of your executable file
Assuming that you're dynamically linked (the default/usual), the ld.so program (e.g. /lib/ld-linux.so.2) defined in your program's headers sets up memory mapping for shared libraries
The kernel does a jmp into the startup routine of your program (for a C program, that's something like crtprec80, which calls main). Since it has only set up the mapping, and not actually loaded any pages(*), this causes a Page Fault from the CPU's Memory Management Unit, which is an interrupt (exception, signal) to the kernel.
The kernel's Page Fault handler loads some section of your program, including the part that caused the page fault, into RAM.
As your program runs, if it accesses a virtual address that doesn't have RAM backing it up right now, Page Faults will occur and cause the kernel to suspend the program briefly, load the page from disc, and then return control to the program. This all happens "between instructions" and is normally undetectable.
As you use malloc/new, the kernel creates read-write pages of RAM (without disc backing files) and adds them to your virtual address space.
If you throw a Page Fault by trying to access a memory location that isn't set up in the virtual memory mappings, you get a Segmentation Violation Signal (SIGSEGV), which is normally fatal.
As the system runs out of physical RAM, pages of RAM get removed; if they are read-only copies of something already on disc (like an executable, or a shared object file), they just get de-allocated and are reloaded from their source; if they're read-write (like memory you "created" using malloc), they get written out to the ( page file = swap file = swap partition = on-disc virtual memory ). Accessing these "freed" pages causes another Page Fault, and they're re-loaded.

Generally, though, until your process is bigger than available RAM — and data is almost always significantly larger than the executable — you can safely pretend that you're alone in the world and none of this demand paging stuff is happening.

So: effectively, the kernel already is running your program while it's being loaded (and might never even load some pages, if you never jump into that code / refer to that data).

If your startup is particularly sluggish, you could look at the prelink system to optimize shared library loads. This reduces the amount of work that ld.so has to do at startup (between the exec of your program and main getting called, as well as when you first call library routines).

Sometimes, linking statically can improve performance of a program, but at a major expense of RAM — since your libraries aren't shared, you're duplicating "your libc" in addition to the shared libc that every other program is using, for example. That's generally only useful in embedded systems where your program is running more-or-less alone on the machine.

(*) In point of fact, the kernel is a bit smarter, and will generally preload some pages to reduce the number of page faults, but the theory is the same, regardless of the optimizations

answered Dec 14, 2011 at 15:32

BRPocock

14k3 gold badges35 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

LittleFunnyMan Over a year ago

I got the feeling you've forgotten the relocator which needs to run over an executeable block loaded into memory (en.wikipedia.org/wiki/Relocation_%28computer_science%29). As far as I know this happens once during the loading of the code and I'm not sure if the 'page-fault' handler will simply run this 'relocator' algorithm while the program is already running. So in my opinion an executable block at least has to be loaded once before execution starts. But I'm not a 'linux-loader' expert ...

BRPocock Over a year ago

I believe that's all handled by ld.so on Linux, but I don't tend to meddle in the affairs of wizards :-) … There's also a lot of black magic involved in thunk operations and such, that happen the first time a library routine is invoked …

Richard Kettlewell Over a year ago

See em386.blogspot.com/2006/10/… for some discussion of how the relocation works.

BRPocock Over a year ago

That reinforces my vague understanding that the lazy linking process happens as the code is running … the kernel can actually mmap the binary and start executing it before any of the symbols are actually linked through the PLT. It might be “fun” to trace the actual execution of '(kernel dynamic-linker user-program libc) through the startup process on a VM, for definitions of “fun” that are boring but make for good research papers.

abhi4eternity Over a year ago

@BRPocock and others: Could you please indicate, where does loader (or relocating loader) come into picture here? When does the kernel begins creating a process context from the executable?

fge · Accepted Answer · 2011-12-14 15:16:47Z

7

No, it only loads the necessary pages into memory. This is demand paging.

I don't know of a tool which can really show that in real time, but you can have a look at /proc/xxx/maps, where xxx is the PID of your process.

answered Dec 14, 2011 at 15:16

fge

122k35 gold badges266 silver badges340 bronze badges

1 Comment

BRPocock Over a year ago

In Gnome, the System Monitor will show the memory maps. Applications / System Tools / System Monitor → Processes tab → right-click on process → Memory Maps. The window often appears empty when it opens, because the scroll bars are positioned awkwardly, so check them if you think you got a blank display :-)

Dan Fego · Accepted Answer · 2011-12-14 15:24:09Z

While you ask a valid question, I don't think it's something you need to worry about. First off, a binary of 100M is not impossible. Second, the system loader will load the pages it needs from the ELF (Executable and Linkable Format) into memory, and perform various relocations, etc. that will make it work, if necessary. It will also load all of its requisite shared library dependencies in the same way. However, this is not an incredibly time-consuming process, and one that doesn't really need to be optimized. Arguably, any "optimization" would have a significant overhead to make sure it's not trying to use something that hasn't been loaded in its due course, and would possibly be less efficient.

If you're curious what gets mapped, as fge says, you can check /proc/pid/maps. If you'd like to see how a program loads, you can try running a program with strace, like:

strace ls

It's quite verbose, but it should give you some idea of the mmap() calls, etc.

Collectives™ on Stack Overflow

When a binary file runs, does it copy its entire binary data into memory at once? Could I change that?

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related