Reverse engineered code seems to have no point

Question

I am working on reverse engineering some code and am coming across some code that doesn't appear to make sense

For example

push 0x3C2D1C06
mov [esp], ebx
mov[esp], edx
mov dword ptr[esp], 0x7D0B46E7`

Is there logic behind this or is this something that comes from compiler generation. It occurs to me the this would net the same as

push 0x7D0B46E7

Another example is

pusha
push ecx
pop ecx
pusha
popa
popa
push eax
push edx
rdtsc
pusha
popa
pop edx
pop eax

I dont understand all of the push/pop combinations

EDIT:

I am really intrigued now on the root of this. I do have to believe that this is one of a few sources:

1.) As mentioned possible that this was done to obscure reverse engineering

2.) Some form of optimization that is over my head(I have a hard time seeing this one)

3.) Some form of compiler action linking possibly multi platform libraries or something

I am convinced that the code is disassembled correctly because the sequences may be odd the do not ever corrupt the stack etc. I am curious about some of the sequences such as

push edi
push eax
push eax
mov eax, 0x577D25CD
mov [esp], eax
push eax
pop edi
and edi, 0x3F7BEBDC
shl edi, 0x3
xor edi, 0x6E778451
sub edi, 0x0D5BE8A31
mov ebx, edi
push [esp]
pop edi

To start with

push edi
push eax
push eax
mov eax, 0x577D25CD
mov [esp], eax
pop eax
pop edi

I believe is the same as writing

push edi
push 0x577D25CD
pop edi

OR even

push edi
mov edi, 0x577D25CD

But even more confusing is what follows,

and edi(0x577D25CD), 0x3F7BEBDC ->edi=0x177921CC

shl edi(0x177921CC), 3 ->edi=0xBBC90E60

xor edi(0xBBC90E0), 0x6e778451 -> edi=0xD5BE8A31

sub edi(0x0D5BE8A31), 0x0D5BE8A31 -> edi=0x0

mov ebx, edi

So that could be replaced with

mov ebx, 0x0 OR xor ebx, ebx

So this leaves me with a few new questions. Is this point more firmly to a form of reverse engineering protection, are there tools that do this. Could there be any other logical reasons to execute a stream like this. Is it fair to assume that a compiler would not perform operations to set values below ESP, I have not found indications that any of the values below ESP are used

A couple last points, I have started distilling the code down to just the effective result and so far everything thing works as before and i've removed over 200bytes of code. This further points me to the idea that I'm correctly interpreting the code. I also find it hard to believe incorrectly disassembled code would produce a sequence of xor, and, shl operations that exactly equal 0x00

EDIT 2:

I can assuredly say this is code I am interpreting. I focused in on one section, I posted a portion of it in my first edit and when I sorted through the 86 asm instructions I found the result was simply this

start:
 test ebx, ebx
 jz end
 add [eax], ecx
 add eax, 0x04
 sub ebx, 0x1
 jmp start
end:

The rest was as mentioned before "clobbering" the stack with odd combinations of pushes, pops, xchgs, and arithmetic on the esp. Sometimes there was something like add esp, 0x4 then mov [esp], eax which is the same as push eax In the end the section I worked through was offsetting an array of addresses by the Image_Base, a form of relocation I guess. When I peek the relocated addresses they point to code segments and what looks like the start of procedures. I have a hard time seeing someone writing the thousands of lines of code that appear to do this which makes me wonder..

Does anyone know of a tool that does this. Has someone created a program that takes code and obscures it with nonsense to prevent reverse engineering.

It's almost certain that you're disassembling either from a starting address that isn't the start of an instruction in the code or that you're disassembling data that isn't code at all. Without more information on how you got the examples, it's impossible to say. — Gene
– Gene, Commented Jan 30, 2018 at 1:49
MSVC/MSVC++ (32-bit) allows people to write inline assembly using asm directive. That timing code almost looks valid, but the reason for all the successive push and pops I don't understand. There is no reason to believe that MSVC code generator actually generated these sequences of bytes, a human must have. even the most unoptimized MSVC code would not generate code with push/pop's like that unless the porgrammer coded them that way himself. Programs written in assembly can also link the MSVCRT runtime as well. — Michael Petch
– Michael Petch, Commented Jan 30, 2018 at 1:52
@Gene This is definitely executing code. I am loading and stepping through with IDA I am stepping through these instructions and see them execute and see the registers and stacks change as expected. If I continue execution the application executed as expected. — MDK
– MDK, Commented Jan 30, 2018 at 1:52
I agree with Michael, the 2nd sequence is definitely not compiler output, and is a NOP (other than clobbering memory below the stack), because it overwrites both outputs of rdtsc. I'm pretty sure there's no way (other than inline asm) to get MSVC to emit that push / overwrite twice / overwrite with another constant, even with contrived code using a volatile function arg or something. If it was just repeated dead stores, volatile could explain it, but the push rules out just normal assignments to a variable. MSVC (like other x86 compilers) reserves space for locals with sub esp, imm — Peter Cordes
– Peter Cordes, Commented Jan 30, 2018 at 2:26
Have you considered that it could be obfuscation meant to prevent reverse engineering? — user3185968
– user3185968, Commented Jan 30, 2018 at 7:11

Peter Cordes · Accepted Answer · 2018-01-31 19:03:52Z

The most plausible source of these instruction-sequences is that they're hand-crafted as obfuscation, possibly in inline-asm.

As I commented earlier, if that's the case then this code is working exactly as intended: it's confused / puzzled you and caused you to spend significant time on it instead of just reverse-engineering the actual program logic.

The fact that none of these sequences crash or mess up the stack is a clear sign that it's definitely not accidental execution of data as code.

Your test of actually setting breakpoints in this code to see that it was actually executed was useful here, too. It's not uncommon for disassembly output to include some data stored in code sections. But seeing that it is executed, we know it's not just data.

2.) Some form of optimization that is over my head(I have a hard time seeing this one)

You are right to be skeptical of this idea, but yes you always need to consider that possibility.

Sometimes sub-optimal code exists because someone thought it would be fast, at least on some specific CPU. But that's not even plausible here. I can't believe anyone would think these long convoluted sequences would be faster on any x86 CPU than xor ebx,ebx, push imm32, or mov r32, imm32. Those instructions are all fast on their own, and the sequence that's equivalent to push imm32 actually uses a push imm32 + some dead stores, so it's clearly not trying to avoid push imm32 for some hypothetical CPU where that instruction is slow.

Especially in the NOP case, zero instructions is always better. If you need padding, Intel and AMD publish recommended long-NOP sequences that have minimal overhead. pusha / popa are slow on everything, and it's pretty obvious they do a lot of work, so it doesn't make sense that someone would intentionally use them as padding.

Other than obfuscation, the other plausible purpose is intentional delay. That's sort of a performance reason, but if timing is part of the correctness of your program, fixed sequences like this are not a reliable way to achieve it across a range of different x86 microarchitectures!

But I could believe that some / all of these were a bad implementation of "waste several cycles here". Especially pusha / popa takes a lot of time in few instruction bytes.

Does anyone know of a tool that does this. Has someone created a program that takes code and obscures it with nonsense to prevent reverse engineering.

Good question. Inserting crazy NOPs or turning push imm32 or xor-zeroing into obfuscated sequences is something a tool could maybe do automatically. But usually you'd want to avoid that inside any loops that are actually important for overall performance, so IDK.

I am 100% convinced now that this is obfuscated code. The only thing I am having a hard time believing is that it was all hand crafted. there are literally thousands of bytes of code that follow long complex NOP's I figure the best way to reverse engineer it is to first distill out the garbage. The section I just finished had close to 300 bytes of stack mangling and the net result was to subtract a fixed offset from the EIP (done with a call rel 0) the resultant math calculated the base of the dll at run time. Its also surprising to me that this obfuscation is so proliferated

Collectives™ on Stack Overflow

Reverse engineered code seems to have no point

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related