17

Does it copy the entire binary to the memory before it executes? I am interested in this question and want to change it into some other way. I mean, if the binary is 100M big (seems impossible), I could run it while I am copying it into the memory. Could that be possible?

Or could you tell me how to see the way it runs? Which tools do I need?

3
  • 3
    The OS normally deals with this in a sane manner. Commented Dec 14, 2011 at 15:16
  • 1
    100MB binary impossible? I have here a 6MB binary compiled from an only 500 LOC, heavily templated and Boosted C++ program. I bet I could bump that to 100MB without too much trouble. Commented Dec 14, 2011 at 15:21
  • 5
    Nothing is impossible! Just get in touch with a merry Eclipse developer, they will teach you all the tricks. Commented Dec 14, 2011 at 15:51

3 Answers 3

42

The theoretical model for an application-level programmer makes it appear that this is so. In point of fact, the normal startup process (at least in Linux 1.x, I believe 2.x and 3.x are optimized but similar) is:

  • The kernel creates a process context (more-or-less, virtual machine)
  • Into that process context, it defines a virtual memory mapping that maps from RAM addresses to the start of your executable file
  • Assuming that you're dynamically linked (the default/usual), the ld.so program (e.g. /lib/ld-linux.so.2) defined in your program's headers sets up memory mapping for shared libraries
  • The kernel does a jmp into the startup routine of your program (for a C program, that's something like crtprec80, which calls main). Since it has only set up the mapping, and not actually loaded any pages(*), this causes a Page Fault from the CPU's Memory Management Unit, which is an interrupt (exception, signal) to the kernel.
  • The kernel's Page Fault handler loads some section of your program, including the part that caused the page fault, into RAM.
  • As your program runs, if it accesses a virtual address that doesn't have RAM backing it up right now, Page Faults will occur and cause the kernel to suspend the program briefly, load the page from disc, and then return control to the program. This all happens "between instructions" and is normally undetectable.
  • As you use malloc/new, the kernel creates read-write pages of RAM (without disc backing files) and adds them to your virtual address space.
  • If you throw a Page Fault by trying to access a memory location that isn't set up in the virtual memory mappings, you get a Segmentation Violation Signal (SIGSEGV), which is normally fatal.
  • As the system runs out of physical RAM, pages of RAM get removed; if they are read-only copies of something already on disc (like an executable, or a shared object file), they just get de-allocated and are reloaded from their source; if they're read-write (like memory you "created" using malloc), they get written out to the ( page file = swap file = swap partition = on-disc virtual memory ). Accessing these "freed" pages causes another Page Fault, and they're re-loaded.

Generally, though, until your process is bigger than available RAM — and data is almost always significantly larger than the executable — you can safely pretend that you're alone in the world and none of this demand paging stuff is happening.

So: effectively, the kernel already is running your program while it's being loaded (and might never even load some pages, if you never jump into that code / refer to that data).

If your startup is particularly sluggish, you could look at the prelink system to optimize shared library loads. This reduces the amount of work that ld.so has to do at startup (between the exec of your program and main getting called, as well as when you first call library routines).

Sometimes, linking statically can improve performance of a program, but at a major expense of RAM — since your libraries aren't shared, you're duplicating "your libc" in addition to the shared libc that every other program is using, for example. That's generally only useful in embedded systems where your program is running more-or-less alone on the machine.

(*) In point of fact, the kernel is a bit smarter, and will generally preload some pages to reduce the number of page faults, but the theory is the same, regardless of the optimizations

Sign up to request clarification or add additional context in comments.

5 Comments

I got the feeling you've forgotten the relocator which needs to run over an executeable block loaded into memory (en.wikipedia.org/wiki/Relocation_%28computer_science%29). As far as I know this happens once during the loading of the code and I'm not sure if the 'page-fault' handler will simply run this 'relocator' algorithm while the program is already running. So in my opinion an executable block at least has to be loaded once before execution starts. But I'm not a 'linux-loader' expert ...
I believe that's all handled by ld.so on Linux, but I don't tend to meddle in the affairs of wizards :-) … There's also a lot of black magic involved in thunk operations and such, that happen the first time a library routine is invoked …
See em386.blogspot.com/2006/10/… for some discussion of how the relocation works.
That reinforces my vague understanding that the lazy linking process happens as the code is running … the kernel can actually mmap the binary and start executing it before any of the symbols are actually linked through the PLT. It might be “fun” to trace the actual execution of '(kernel dynamic-linker user-program libc) through the startup process on a VM, for definitions of “fun” that are boring but make for good research papers.
@BRPocock and others: Could you please indicate, where does loader (or relocating loader) come into picture here? When does the kernel begins creating a process context from the executable?
7

No, it only loads the necessary pages into memory. This is demand paging.

I don't know of a tool which can really show that in real time, but you can have a look at /proc/xxx/maps, where xxx is the PID of your process.

1 Comment

In Gnome, the System Monitor will show the memory maps. Applications / System Tools / System Monitor → Processes tab → right-click on process → Memory Maps. The window often appears empty when it opens, because the scroll bars are positioned awkwardly, so check them if you think you got a blank display :-)
3

While you ask a valid question, I don't think it's something you need to worry about. First off, a binary of 100M is not impossible. Second, the system loader will load the pages it needs from the ELF (Executable and Linkable Format) into memory, and perform various relocations, etc. that will make it work, if necessary. It will also load all of its requisite shared library dependencies in the same way. However, this is not an incredibly time-consuming process, and one that doesn't really need to be optimized. Arguably, any "optimization" would have a significant overhead to make sure it's not trying to use something that hasn't been loaded in its due course, and would possibly be less efficient.

If you're curious what gets mapped, as fge says, you can check /proc/pid/maps. If you'd like to see how a program loads, you can try running a program with strace, like:

strace ls

It's quite verbose, but it should give you some idea of the mmap() calls, etc.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.