0

I'm struggling to figure out how a certain block would function. With the following address on the heap

004B0000 73 6D 67 66 74 smgft

and the following assembly:

77A701B8 xor eax, eax
77A701BA mov ecx, 4
77A701BF lea edi, DWORD PTR DS:[ecx+4B0000]
77A701C5 xor DWORD PTR DS:[edi], ecx
77A701C5 loopd short ntdll.77A701BF

The problem is to provide the value of the five bytes on the heap in ASCII after the instructions have executed. What I can understand from it is as follows

xor eax, eax ; 0 out eax

mov ecx, 4 ; set ecx 4

lea edi, dword ptr ds:[ecx+4b0000] ; this loads into EDI whatever is stored at ecx+4b0000, so 4b0004. I'm not sure what this would grab. I'm not even sure what 4b0000 would get, since it's 5 bytes. mgft, or smgf? I think smgf? And how does the +4h affect this? Makes it 736D676678?

xor dword ptr ds:[edi], ecx ; So this will xor 4h with the newly loaded dword at edi, but what does it do with it in the loopd?

loopd short ntdll.77A701BF ; So this is a "loop while equal" but I'm not sure what that translates to with a xor above it. And does it decrement ecx? But then it jumps back to the lea line.

1

1 Answer 1

4

The lea edi, dword ptr ds:[ecx+4b0000] loads the value ecx+0x004b0000 into EDI, and doesn't access memory at all. The loop instruction is like "ecx = ecx - 1; if(ecx != 0) goto ntdll.77A701BF".

Not that this code can be unrolled, so that it becomes:

    xor eax, eax

    lea edi, DWORD PTR DS:[4+0x004B0000]
    xor DWORD PTR DS:[edi], 0x00000004

    lea edi, DWORD PTR DS:[3+0x004B0000]
    xor DWORD PTR DS:[edi], 0x00000003

    lea edi, DWORD PTR DS:[2+0x004B0000]
    xor DWORD PTR DS:[edi], 0x00000002

    lea edi, DWORD PTR DS:[1+0x004B0000]
    xor DWORD PTR DS:[edi], 0x00000001

    xor ecx,ecx

Which can be optimised more, so it becomes:

    xor BYTE PTR DS:[0x004B0004], 0x04
    xor BYTE PTR DS:[0x004B0003], 0x03
    xor BYTE PTR DS:[0x004B0002], 0x02
    xor BYTE PTR DS:[0x004B0001], 0x01

    xor eax, eax       ;May be unnecessary if value unused by later code
    mov edi,0x004B0001 ;May be unnecessary if value unused by later code
    xor ecx, ecx       ;May be unnecessary if value unused by later code

Which can be optimised a little more by combining the XORs:

    xor DWORD PTR DS:[0x004B0001], 0x04030201

    xor eax, eax       ;May be unnecessary if value unused by later code
    mov edi,0x004B0001 ;May be unnecessary if value unused by later code
    xor ecx, ecx       ;May be unnecessary if value unused by later code

Note: Yes, this is a misaligned XOR, but likely faster than multiple smaller aligned XORs on modern CPUs as it doesn't cross a cache line boundary.

Essentially; the entire loop can be reduced to a single instruction.

Sign up to request clarification or add additional context in comments.

2 Comments

Even if it crossed a cache line boundary, it would probably be faster than 4 memory-destination byte XORs on most CPUs for most kinds of surrounding code. (memory-destination ALU insns always decode to multiple uops on Intel CPUs). If it crossed a page boundary (on pre-Skylake), then separate XORs would win for latency, but still not for throughput unless that slow-to-retire instruction stalled out-of-order execution later on.
which means the content of memory will turn into 004B0000 73 6C 65 65 70 sleep

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.