Changes

Programming:Integer Multiplication

6,379 bytes added, 6 March

/* Fast 8bit * 8bit Unsigned with only 512 bytes of tables */ added improved version by calc84maniac

</pre>

== ~~Faster~~Fastest, accurate 8bit * 8bit Unsigned with 16 KB tables == '''Input:''' L = Multiplier, C = Multiplicand '''Output:''' DE = Product '''CPC Cycles:''' 80 = 20 usec '''Size:''' 14 bytes of code + 16 KB tables

I'm currently working on a new routine, based on a routine by Kirk Meyer [http://www.ticalc.org/pub/86/asm/source/routines/vfmult.asm] which uses nibble multiplication tables.

ld d,a ; 20

</pre>

== Fast, accurate 8bit * 8bit Unsigned with 8 KB tables ==

'''Input:''' L = Multiplier, C = Multiplicand

'''Output:''' DE = Product

'''CPC Cycles:''' 100 = 25 usec

'''Size:''' 19 bytes of code + 8 KB tables

16K is a lot of memory to use for tables, but I'm working on a way to reduce this to 8K while maintaining similar performance (around 26μs). The idea is rather than using tables for the low and high nibbles, use tables for alternate bits. So there would be 16 tables for the values #00, #01, #04, #05, #10, #11, #14, #15, #40, #41, #44, #45, #50, #51, #54, #55. The values can be shifted by 1 (rather than 4) to give values for the alternate bits using a simple ADD HL,HL.

== Very fast 8bit * 8bit Unsigned with only 1K of tables ==

'''Input:'''~~: B~~ A = ''Multiplier'', C L = ~~Multiplicant~~''Multiplicand''

'''Output:''': DE = ''Product''

'''~~Clocks~~CPC Cycles''': 104-112 (108 on average)= 26-28 (27) usec

'''Size''': 24 bytes of code and 1024 bytes for the tables

Now for the actual multiply routine:

~~'''Input:''' A = ''Multiplier'', L = ''Multiplicand''~~

~~'''Output:''' DE = ''Product''~~

<pre>

ld h,umul_tab_lo / #100 ; 2

[[User:Executioner|Executioner]] 06:25, 4 April 2008 (CEST)

== Fast 8bit * 8bit Unsigned with only 512 bytes of tables == '''Input''': A = Multiplier, B = Multiplicant '''Output''': A:E = Product '''CPC Cycles''': 136-172 (154 on average) = 34-43 (38.5) usec '''Size''': 44 bytes of code and 512 bytes for the tables Fast 8 bit unsigned multiplication with 16 bit result. It uses formula x*y = ((x+y)/2)2 - ((x-y)/2)2, if x+y is even = ((x+y-1)/2)2 - ((x-y-1)/2)2 + y, if x+y is odd and x>=y <pre> cp b jr nc,l1 ld e,a ld a,b ld b,e l1 ld c,a sub b rra ld d,a ld a,c add a,b rra ld l,a ld h,high(sqrlo) ld a,(hl) ld e,l ld l,d jr nc,l2 sub (hl) ;odd ld l,e ld e,a inc h ;loads high(sqrhi) ld a,(hl) ld l,d sbc a,(hl) ld d,a ld a,e add a,b ld e,a ld a,d adc a,0 ret l2 sub (hl) ;even ld l,e ld e,a inc h ld a,(hl) ld l,d sbc a,(hl) ret sqrlo ;low(x*x) should be at the page border db 0,1,4,9,$10,$19,$24,$31,$40,$51,$64,$79,$90,$a9,$c4,$e1 db 0,$21,$44,$69,$90,$b9,$e4,$11,$40,$71,$a4,$d9,$10,$49,$84,$c1 db 0,$41,$84,$c9,$10,$59,$a4,$f1,$40,$91,$e4,$39,$90,$e9,$44,$a1 db 0,$61,$c4,$29,$90,$f9,$64,$d1,$40,$b1,$24,$99,$10,$89,4,$81 db 0,$81,4,$89,$10,$99,$24,$b1,$40,$d1,$64,$f9,$90,$29,$c4,$61 db 0,$a1,$44,$e9,$90,$39,$e4,$91,$40,$f1,$a4,$59,$10,$c9,$84,$41 db 0,$c1,$84,$49,$10,$d9,$a4,$71,$40,$11,$e4,$b9,$90,$69,$44,$21 db 0,$e1,$c4,$a9,$90,$79,$64,$51,$40,$31,$24,$19,$10,9,4,$1 db 0,1,4,9,$10,$19,$24,$31,$40,$51,$64,$79,$90,$a9,$c4,$e1 db 0,$21,$44,$69,$90,$b9,$e4,$11,$40,$71,$a4,$d9,$10,$49,$84,$c1 db 0,$41,$84,$c9,$10,$59,$a4,$f1,$40,$91,$e4,$39,$90,$e9,$44,$a1 db 0,$61,$c4,$29,$90,$f9,$64,$d1,$40,$b1,$24,$99,$10,$89,4,$81 db 0,$81,4,$89,$10,$99,$24,$b1,$40,$d1,$64,$f9,$90,$29,$c4,$61 db 0,$a1,$44,$e9,$90,$39,$e4,$91,$40,$f1,$a4,$59,$10,$c9,$84,$41 db 0,$c1,$84,$49,$10,$d9,$a4,$71,$40,$11,$e4,$b9,$90,$69,$44,$21 db 0,$e1,$c4,$a9,$90,$79,$64,$51,$40,$31,$24,$19,$10,9,4,$1sqrhi ;high(x*x) db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 db 1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3 db 4,4,4,4,5,5,5,5,6,6,6,7,7,7,8,8 db 9,9,9,$a,$a,$a,$b,$b,$c,$c,$d,$d,$e,$e,$f,$f db $10,$10,$11,$11,$12,$12,$13,$13,$14,$14,$15,$15,$16,$17,$17,$18 db $19,$19,$1a,$1a,$1b,$1c,$1c,$1d,$1e,$1e,$1f,$20,$21,$21,$22,$23 db $24,$24,$25,$26,$27,$27,$28,$29,$2a,$2b,$2b,$2c,$2d,$2e,$2f,$30 db $31,$31,$32,$33,$34,$35,$36,$37,$38,$39,$3a,$3b,$3c,$3d,$3e,$3f db $40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$4a,$4b,$4c,$4d,$4e,$4f db $51,$52,$53,$54,$55,$56,$57,$59,$5a,$5b,$5c,$5d,$5f,$60,$61,$62 db $64,$65,$66,$67,$69,$6a,$6b,$6c,$6e,$6f,$70,$72,$73,$74,$76,$77 db $79,$7a,$7b,$7d,$7e,$7f,$81,$82,$84,$85,$87,$88,$8a,$8b,$8d,$8e db $90,$91,$93,$94,$96,$97,$99,$9a,$9c,$9d,$9f,$a0,$a2,$a4,$a5,$a7 db $a9,$aa,$ac,$ad,$af,$b1,$b2,$b4,$b6,$b7,$b9,$bb,$bd,$be,$c0,$c2 db $c4,$c5,$c7,$c9,$cb,$cc,$ce,$d0,$d2,$d4,$d5,$d7,$d9,$db,$dd,$df db $e1,$e2,$e4,$e6,$e8,$ea,$ec,$ee,$f0,$f2,$f4,$f6,$f8,$fa,$fc,$fe</pre> [[User:Litwr|Litwr]] 10:25, 24 October 2015 (CEST) == Fast 8bit * 8bit Unsigned with only 512 bytes of tables (improved version) == '''Input''': A = Multiplier, L = Multiplicant '''Output:''' DE = ''Product'' '''CPC Cycles''': 124 = 31 usec '''Size''': 27 bytes of code and 512 bytes for the tables '''by calc84maniac''' Hey, so someone brought up the wiki's 512-byte table routine in the context of Game Boy development, and I thought I'd take a crack at seeing if the speed and size could be made closer to the 1KB table routine. I actually had a good bit of success, by making the following observations: * The x >= y constraint isn't necessary if negative values for floor((x - y) / 2) have their square calculated properly, so no need to swap inputs. * Going from floor((x + y) / 2) to floor((x - y) / 2) simply requires a subtraction of y, so no need to divide by 2 twice or save the input value of x. * To square a negative number z, it's sufficient to calculate z ^ 2 = (z + 256) ^ 2 - z * 512 - 65536, meaning a conditional offset of -(z * 512) in the subtracted square, or z * 512 in the final result. * When conditionally adding y to the result when x + y is odd, the carry can be propagated into the low bit of the 8-bit register holding the upper half of z * 512. * Both of these conditional adds can be turned into branchless adds of either the value or zero. This results in the following implementation (which I adjusted to take the same inputs and outputs as the 1KB routine for a fair comparison), which comes out to 27 bytes and a constant 31 usec: <pre>; Input: A = x, L = y; Output: DE = x * y ; C = floor((x + y) / 2) add a,l ; 1 rra ; 2 ld c,a ; 3 ; E = y if x + y is odd, else 0 sbc a,a ; 4 and l ; 5 ld e,a ; 6 ; L = floor((x - y) / 2) ld a,c ; 7 sub l ; 8 ld l,a ; 9 ; D = L if negative, else 0 sbc a,a ; 10 and l ; 11 ld d,a ; 12 ; Calculate low half of product ld h,high(sqrlo) ; 14 ld b,h ; 15 ld a,(bc) ; 17 add a,e ; 18 rl d ; 20, multiplies D by 2 and propagates carry sub (hl) ; 22 ld e,a ; 23 ; Calculate high half of product inc h ; 24, loads high(sqrhi) ld b,h ; 25 ld a,(bc) ; 27 sbc a,(hl) ; 29 add a,d ; 30 ld d,a ; 31</pre> == 16bit * 16bit Unsigned -> 32 bits result == '''Input:''' BC = ''Multiplier'', DE = ''Multiplicand'' '''Output:''' DE,HL = ''Product''<pre>ld hl,0ld a,16muluw add hl,hl rl e rl d jr nc,muluw_cont add hl,bc jr nc,muluw_cont inc demuluw_cont dec a jr nz,muluw</pre> == 16bit * 16bit Unsigned -> 24 bits result ==

'''Input:''' BC = ''Multiplier'', DE = ''Multiplicand''

← Older edit

Prodatron

Check user, administrator

1,531

edits

Changes

Programming:Integer Multiplication

CPCWiki - THE Amstrad CPC encyclopedia!