0

I'm currently working on an Obfuscator for assembled x86 assembly (working with the raw bytes).

To do that I first need to build a simple parser, to "understand" the bytes. I'm using a database that I create for myself mostly with the website: https://defuse.ca/online-x86-assembler.htm

Now my question: Some bytes can be interpreted in two ways, for example (intel syntax):

1. f3 00 00                repz add BYTE PTR [eax],al
2. f3                      repz

My idea way to loop through the bytes and work with every instruction as single, but when I reach byte '0xf3' I have 2 ways of interpreting it.

I know there are working x86 disassemblers out there, how do I know what case this is?

3

1 Answer 1

4

Prefixes, including repz prefix, are not meaningful without subsequent instruction. The subsequent instruction may incorporate the prefix (repz nop is pause), change its meaning (repz is xrelease if used before some interlocked instruction), or the prefix may be just invalid.

The decoding is always unambiguous, otherwise the CPU could not execute instructions. It may be ambiguous only if you don't know exact byte offset where to begin decoding (as x86 uses variable instruction length).

Sign up to request clarification or add additional context in comments.

9 Comments

decoding is always unambiguous - or at least, any given CPU will pick one way of decoding. Intel's manual says it's "illegal" to have multiple REX prefixes on one instruction, but their Skylake CPUs for example will take the last one (like with other repeated prefixes), not #UD fault. There is AFAIK no Intel documentation that says this is what will happen. But yes, they're still REX prefixes, so unambiguous in that sense.
Finally found the Q&A where I'd tested repeated REX prefixes: Segmentation fault when using DB (define byte) inside a function
@PeterCordes Just clarifying, when parsing the subsequent instructions, all you need to do is look for the prefix bytes? To get all the bytes for an instruction, you simply go from the prefix to the the next prefix - 1?
@HappyJerry: Yeah, any number of prefixes can be part of one instruction. The first non-prefix byte is the opcode. (There's a length limit of 15 bytes per instruction, so #UD if you don't get to the end of an instruction before then, even if you've seen opcode + modrm which tell you how many more bytes of disp32 and/or imm32 there are.)
@PeterCordes So the delimiter for an instruction may look like: current_position == prefix AND current_position -1 != prefix . Once these conditions are met, I could assume that that I've reached the end of an instruction?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.