I have a serious problem I really want to solve. I am trying to implement void operator divide(uint32_t dividend[4],int32_t divisor,uint32_t "ient[4],int32_t &remainder). My algorithm is:
sign_ext_int32 dividend[3] ; result placed into edx:eax
idiv divisor ; sign div edx:eax by divisor. Result into edx=remainder, eax=quotient
mov quotient[3], eax
;
mov eax, dividend[2]
idiv divisor ; throws overflow exception if edx=0, eax=0xffff_ffff
mov quotient[2], eax
add quotient[3], edx
;
mov eax, dividend[1]
idiv divisor
mov quotient[1], eax
add quotient[2], edx
sign_ext_int32 edx
adc quotient[3], edx ; add any carry from last add instruction to quotient[3]
;
mov eax, dividend[0]
idiv divisor
mov quotient[0], eax
add quotient[1], edx
sign_ext_int32 edx
adc quotient[2], edx
adc quotient[3], edx
This algorithm works on paper. However since x86 idiv instruction doesn't use a qword register to store the quotient; any quotient not fitting into eax will trigger overflow exception. Example:
edx=0,
eax=0xffff_ffff
idiv
ecx=0xffff_ffff
causes overflow exception
I have a work around; but it would require three more registers. Also this only works if a simple idiv would result in an overflow.
; edx contains last remainder
mov eax, dividend[2]
movzx reg1, ax ; save loword of edx:eax
;
shrd eax, edx, 16 ; make edx:eax smaller
shr edx, 16
;
idiv divisor
;
shl edx, 16 ; bring remainder to original size
shl eax, 16 ; bring quotient to original size
mov reg2, edx ; save remainder
mov reg3, eax ; save quotient
;
mov edx, 0
mov eax, reg1
idiv divisor ; divide loword saved
;
add eax, reg3 ; add quotients together to get total quotient
add edx, reg2 ; add remainders together to get total remainder
;
mov quotient[2], eax
; edx remainder ready for next iteration
As you can see this requires alot of instructions and registers. Anybody has a solution for doing x86 signed division on operands that can lead to overflow?
dividendanddivisor, swapping them if necessary, doing an unsigned division and flipping the sign of the results at the end, if necessary?idivgoes all the way back to the 16-bit 8086. Now try to estimate the cost of having a32/16=32divide instruction on a microprocessor designed in 1978 with 29000 transistors. Note that it would need to have three output registers; I don't believe there is any other instruction with more than two.edx) is larger than abs(divisor)/2 and add or subtract divisor in that case, but also increment the quotient at the right place. But that sounds like much more overhead than handling sign and values separately. I don't think Intel considered extended-precision math with signed integers a common use case.