x86 decoding instruction opcode byte

Question

I'm creating an x86 decoder and I'm struggling on understanding and finding an efficient way to calculate the mnemonic of an instruction.

I know that the opcode 6 MSBs are the opcode bits, but I can't find anywhere that use those 6 bits in a mnemonic table. The only mnemonic table I find is for the whole opcode byte itself and not just the 6 MSBs.

I wanted to ask what are some efficient ways I can go on decoding the mnemonics encoded in the opcode byte, and if there're any table references using the 6 MSBs and not the whole opcode byte.

The bottom 2 bits are also part of the opcode... For example the jcc instructions have different mnemonics for every opcode value from 0x70 to 0x7F. In fact, sometimes the /r field from the ModR/M byte is part of the opcode, too. (e.g. shl vs. shr). — Peter Cordes
– Peter Cordes, Commented Sep 15, 2017 at 3:36
The problem with modern x86 machine code is that there isn't an efficient / simple way to decode it. For example, rep nop actually decodes as pause, or rep bsf decodes as tzcnt (if BMI1 is supported, otherwise it decodes as bsf). So you have to check for mandatory prefixes of other instructions. — Peter Cordes
– Peter Cordes, Commented Sep 15, 2017 at 3:39
@PeterCordes One of the resources I was using is c-jump.com/CIS77/CPU/x86/X77_0050_add_opcode.htm I know there're exceptions as to when not the only 6 MSBs of the opcode byte represent the mnemonic but for a regular instruction it seems this way according to what they're saying. I'm asking on how for these regular cases could I use these 6 MSBs to determine the mnemonic like they did in their example. — Jorayen
– Jorayen, Commented Sep 15, 2017 at 9:06
You mean like const char *mnemonic = table[(uint8_t)opcode>>2]; in C? You do it like that. Although really you probably want a 256-entry table of structs, where one of the members is an enum of what kind of instruction it is (or a function pointer to a function that will decode the rest of the bytes). — Peter Cordes
– Peter Cordes, Commented Sep 15, 2017 at 9:09
@PeterCordes Yea but when looking at a mnemonic table online I couldn't find a way to decide which mnemonic is really used. For example looking at this table sparksandflames.com/files/x86InstructionChart.html using const char *mnemonic = table[(uint8_t)opcode>>2]; when opcode is 0x6 (Push), it could be easily mistaken for 0x5 (Add) if I were to rsh 2 bytes — Jorayen
– Jorayen, Commented Sep 15, 2017 at 9:15

Peter Cordes · Accepted Answer · 2017-09-15 11:27:41Z

1

But isn't there an efficient way to store a table for the mnemonics without duplicates?

This has become an algorithms and data structures question.

As you point out, many of the opcode table entries (at least for the table without a 0f escape byte: http://sparksandflames.com/files/x86InstructionChart.html) do repeat in groups of 4 or 2, i.e. with the same 6 or 7-bit prefix selecting the same mnemonic.

Obviously a 256-entry table of structs is simple, but duplicates things. It's very fast and easy to use, since it's probably still small enough not to cache-miss very often. (Especially since the common entries will stay hot in cache; x86 code uses the same opcodes a lot.)

You can trade simplicity / performance for space.

You could have a 64-entry table of structs where one member is a pointer to a secondary table to be indexed with the low 2 bits. If the pointer is NULL, it means the instruction follows the pattern of add / and / xor / etc. where the low 2 bits tell you 8 bit vs. whatever the operand-size is and direction (r/m,reg or reg,r/m).

Your struct would also need entries for turning into other instructions when certain prefixes are present (e.g. rep nop is pause). Also, AVX VEX prefixes use what used to be an invalid encoding of another instruction. x86 is pretty crazy to decode if you want to do a complete job for all the current extensions.

Actually, it might be simplest (and also efficient) to just use a table of function pointers. Or a struct with a const char* mnemonic and a int (*decode)(const char*mnemonic, const char *insn_bytes, unsigned prefix_bitmap) function, so lots of opcodes can point to the same decode-function but still get different mnemonics. Sometimes the function will ignore the passed mnemonic, but other times that's all it needs. You'd have a common function for decoding addressing modes that many of the decode functions would call.

This is fairly similar to how you might implement an x86 emulator that interprets, instead of doing dynamic recompilation. A common decode loop and then dispatching through function pointers.

An even more complicated data structure you might use is a radix trie aka prefix tree. See also https://en.wikipedia.org/wiki/Trie#Bitwise_tries.

This is getting into silly season, because the density is so high that a lookup table makes much more sense. (There are very few undefined opcode).

edited Sep 15, 2017 at 11:27

answered Sep 15, 2017 at 10:00

Peter Cordes

377k50 gold badges742 silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Jorayen Over a year ago

So if I aim for performance (speed wise) would you say just storing a 256 entry table would be the best choice? Also why would I need a struct? I thought maybe creating an enum of all mnemonics and then create an the 256 entry table as an index table to this enum, what's your thoughts?

Peter Cordes Over a year ago

Yeah, I'd guess that a 256 entry table would perform the best. Extra branching to choose extra decode steps is unlikely to be worth it.

Peter Cordes Over a year ago

@Jorayen: Unless you have a separate table for other special cases, you'd use a struct to hold the mnemonic and tell you how to decode the remaining bytes into operands. (e.g. jcc/call vs. add vs. mul r/m32 vs. imul r,r/m32,imm32 vs. other special cases). And for special cases where 3 more opcode bits come from the /r field in ModR/M, to indicate that (e.g. with a pointer to another table). Having one struct where you use most of the members on every access gives good spatial locality, so it caches well.

Jorayen Over a year ago

Yea I wanted to ask you about the extra branching thing, since not all opcode seems to follow a similar pattern, since there're the alternate encoding for instruction using the accumulator register for example or pushing/popping segments from/to the stack, such as the 0x04 opcode which is an add *AL*, imm8 although according to the normal decoding steps by decoding 0x04 one would expect that this is an add operation to a memory location since the direction flag is set to 0, and not an imm to reg add operation since the MSB isn't set to 1. What are your thoughts on dealing with these

Peter Cordes Over a year ago

@Jorayen: Well there's another pattern for immediate operand instructions, using the /r field in modr/m to pack them into only a few opcodes. I'd use a 256-entry table indexed by opcode byte (and another for opcodes that follow a 0f escape byte). For a given opcode, first check if prefixes make it into another instruction. Then check if a function-pointer is non-NULL. If so, call it (with the instruction bytes, mnemonic and struct* as args, taking instruction length as return value). If not, check a flag in the struct for the /r = opcode bits pattern.

|

Collectives™ on Stack Overflow

x86 decoding instruction opcode byte

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related