1

I have a serious problem I really want to solve. I am trying to implement void operator divide(uint32_t dividend[4],int32_t divisor,uint32_t &quotient[4],int32_t &remainder). My algorithm is:

sign_ext_int32 dividend[3] ; result placed into edx:eax
idiv divisor ; sign div edx:eax by divisor. Result into edx=remainder, eax=quotient
mov quotient[3], eax
;
mov eax, dividend[2]
idiv divisor ; throws overflow exception if edx=0, eax=0xffff_ffff
mov quotient[2], eax
add quotient[3], edx
;
mov eax, dividend[1]
idiv divisor 
mov quotient[1], eax
add quotient[2], edx
sign_ext_int32 edx
adc quotient[3], edx ; add any carry from last add instruction to quotient[3]
;
mov eax, dividend[0]
idiv divisor
mov quotient[0], eax
add quotient[1], edx
sign_ext_int32 edx
adc quotient[2], edx
adc quotient[3], edx

This algorithm works on paper. However since x86 idiv instruction doesn't use a qword register to store the quotient; any quotient not fitting into eax will trigger overflow exception. Example:

edx=0, 
eax=0xffff_ffff 
   idiv 
ecx=0xffff_ffff 
   causes overflow exception

I have a work around; but it would require three more registers. Also this only works if a simple idiv would result in an overflow.

; edx contains last remainder 
mov eax, dividend[2]
movzx reg1, ax ; save loword of edx:eax
;
shrd eax, edx, 16 ; make edx:eax smaller
shr edx, 16
;
idiv divisor
;
shl edx, 16 ; bring remainder to original size
shl eax, 16 ; bring quotient to original size
mov reg2, edx ; save remainder
mov reg3, eax ; save quotient
;
mov edx, 0
mov eax, reg1
idiv divisor ; divide loword saved
; 
add eax, reg3 ; add quotients together to get total quotient
add edx, reg2 ; add remainders together to get total remainder
;
mov quotient[2], eax
; edx remainder ready for next iteration 

As you can see this requires alot of instructions and registers. Anybody has a solution for doing x86 signed division on operands that can lead to overflow?

14
  • 1
    How about checking the sign of dividend and divisor, swapping them if necessary, doing an unsigned division and flipping the sign of the results at the end, if necessary? Commented May 15 at 11:31
  • 1
    Not all CPU instructions take the same time ... If you want to do a faster long division, you should compute the inverse of the divisor once to replace the actual divisions by multiplications, as these are way faster, even on modern CPUs (check for throughput and latency at uops.info/table.html). Doing a correct integer inverse is not completely trivial, but a solved problem (I don't have a link at hand at the moment). Commented May 16 at 8:42
  • 1
    @user13947194: I think you're discovering one of the reasons why people often do extended-precision arithmetic as sign-magnitude rather than two's-complement. (Another obvious one is it makes negation O(1) instead of O(n).) Commented May 16 at 21:56
  • 2
    @user13947194: Before complaining too much about Intel, let's note that the design of idiv goes all the way back to the 16-bit 8086. Now try to estimate the cost of having a 32/16=32 divide instruction on a microprocessor designed in 1978 with 29000 transistors. Note that it would need to have three output registers; I don't believe there is any other instruction with more than two. Commented May 17 at 23:47
  • 1
    You could check if abs(edx) is larger than abs(divisor)/2 and add or subtract divisor in that case, but also increment the quotient at the right place. But that sounds like much more overhead than handling sign and values separately. I don't think Intel considered extended-precision math with signed integers a common use case. Commented May 19 at 11:40

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.