Changes

Jump to: navigation, search

Programming:Integer Multiplication

2,515 bytes added, 6 March
/* Fast 8bit * 8bit Unsigned with only 512 bytes of tables */ added improved version by calc84maniac
[[User:Litwr|Litwr]] 10:25, 24 October 2015 (CEST)
 
 
 
== Fast 8bit * 8bit Unsigned with only 512 bytes of tables (improved version) ==
 
'''Input''': A = Multiplier, L = Multiplicant
 
'''Output:''' DE = ''Product''
 
'''CPC Cycles''': 124 = 31 usec
 
'''Size''': 27 bytes of code and 512 bytes for the tables
 
'''by calc84maniac'''
 
Hey, so someone brought up the wiki's 512-byte table routine in the context of Game Boy development, and I thought I'd take a crack at seeing if the speed and size could be made closer to the 1KB table routine. I actually had a good bit of success, by making the following observations:<br>
* The x >= y constraint isn't necessary if negative values for floor((x - y) / 2) have their square calculated properly, so no need to swap inputs.<br>
* Going from floor((x + y) / 2) to floor((x - y) / 2) simply requires a subtraction of y, so no need to divide by 2 twice or save the input value of x.<br>
* To square a negative number z, it's sufficient to calculate z ^ 2 = (z + 256) ^ 2 - z * 512 - 65536, meaning a conditional offset of -(z * 512) in the subtracted square, or z * 512 in the final result.<br>
* When conditionally adding y to the result when x + y is odd, the carry can be propagated into the low bit of the 8-bit register holding the upper half of z * 512.<br>
* Both of these conditional adds can be turned into branchless adds of either the value or zero.<br>
<br>
This results in the following implementation (which I adjusted to take the same inputs and outputs as the 1KB routine for a fair comparison), which comes out to 27 bytes and a constant 31 usec:<br>
 
<pre>
; Input: A = x, L = y
; Output: DE = x * y
; C = floor((x + y) / 2)
add a,l ; 1
rra ; 2
ld c,a ; 3
; E = y if x + y is odd, else 0
sbc a,a ; 4
and l ; 5
ld e,a ; 6
; L = floor((x - y) / 2)
ld a,c ; 7
sub l ; 8
ld l,a ; 9
; D = L if negative, else 0
sbc a,a ; 10
and l ; 11
ld d,a ; 12
; Calculate low half of product
ld h,high(sqrlo) ; 14
ld b,h ; 15
ld a,(bc) ; 17
add a,e ; 18
rl d ; 20, multiplies D by 2 and propagates carry
sub (hl) ; 22
ld e,a ; 23
; Calculate high half of product
inc h ; 24, loads high(sqrhi)
ld b,h ; 25
ld a,(bc) ; 27
sbc a,(hl) ; 29
add a,d ; 30
ld d,a ; 31
</pre>
== 16bit * 16bit Unsigned -> 32 bits result ==
Check user, administrator
1,531
edits