Changes

Programming:Integer Multiplication

2,820 bytes added, 6 March
/* Fast 8bit * 8bit Unsigned with only 512 bytes of tables */ added improved version by calc84maniac
[[User:Litwr|Litwr]] 10:25, 24 October 2015 (CEST)
  == Fast 8bit * 8bit Unsigned with only 512 bytes of tables (improved version) == '''Input''': A = Multiplier, L = Multiplicant '''Output:''' DE = ''Product'' '''CPC Cycles''': 124 = 31 usec '''Size''': 27 bytes of code and 512 bytes for the tables '''by calc84maniac''' Hey, so someone brought up the wiki's 512-byte table routine in the context of Game Boy development, and I thought I'd take a crack at seeing if the speed and size could be made closer to the 1KB table routine. I actually had a good bit of success, by making the following observations:<br>* The x >= y constraint isn't necessary if negative values for floor((x - y) / 2) have their square calculated properly, so no need to swap inputs.<br>* Going from floor((x + y) / 2) to floor((x - y) / 2) simply requires a subtraction of y, so no need to divide by 2 twice or save the input value of x.<br>* To square a negative number z, it's sufficient to calculate z ^ 2 = (z + 256) ^ 2 - z * 512 - 65536, meaning a conditional offset of -(z * 512) in the subtracted square, or z * 512 in the final result.<br>* When conditionally adding y to the result when x + y is odd, the carry can be propagated into the low bit of the 8-bit register holding the upper half of z * 512.<br>* Both of these conditional adds can be turned into branchless adds of either the value or zero.<br><br>This results in the following implementation (which I adjusted to take the same inputs and outputs as the 1KB routine for a fair comparison), which comes out to 27 bytes and a constant 31 usec:<br> <pre>; Input: A = x, L = y; Output: DE = x * y ; C = floor((x + y) / 2) add a,l ; 1 rra ; 2 ld c,a ; 3 ; E = y if x + y is odd, else 0 sbc a,a ; 4 and l ; 5 ld e,a ; 6 ; L = floor((x - y) / 2) ld a,c ; 7 sub l ; 8 ld l,a ; 9 ; D = L if negative, else 0 sbc a,a ; 10 and l ; 11 ld d,a ; 12 ; Calculate low half of product ld h,high(sqrlo) ; 14 ld b,h ; 15 ld a,(bc) ; 17 add a,e ; 18 rl d ; 20, multiplies D by 2 and propagates carry sub (hl) ; 22 ld e,a ; 23 ; Calculate high half of product inc h ; 24, loads high(sqrhi) ld b,h ; 25 ld a,(bc) ; 27 sbc a,(hl) ; 29 add a,d ; 30 ld d,a ; 31</pre> == 16bit * 16bit Unsigned -> 32 bits result == '''Input:''' BC = ''Multiplier'', DE = ''Multiplicand'' '''Output:''' DE,HL = ''Product''<pre>ld hl,0ld a,16muluw add hl,hl rl e rl d jr nc,muluw_cont add hl,bc jr nc,muluw_cont inc demuluw_cont dec a jr nz,muluw</pre> == 16bit * 16bit Unsigned -> 24 bits result ==
'''Input:''' BC = ''Multiplier'', DE = ''Multiplicand''