</pre>
== Faster, accurate 8bit * 8bit Unsigned ==
I'm currently working on a new routine, based on a routine by Kirk Meyer [http://www.ticalc.org/pub/86/asm/source/routines/vfmult.asm] which uses nibble multiplication tables.
I have a working, tested version which uses 16K of tables and can perform multiplication in 20μs.
The code to produce the tables is below:
<pre>
.maketabs
ld hl,hltab1
.makelp1
ld a,l
rla
and #1e
add restab / 256
ld (hl),a
inc l
jr nz,makelp1
inc h
.makelp2
ld a,l
rra:rra:rra
and #1e
jr z,usez
add restab2 - restab / 256 - 2
.usez
add restab / 256
ld (hl),a
inc l
jr nz,makelp2
inc h ; restab
.makelp3
ld a,(hl)
inc h
ld d,(hl)
inc h
add l
ld (hl),a
inc h
ld a,0
adc d
ld (hl),a
dec h:dec h:dec h
inc l
jr nz,makelp3
inc h
inc h
ld a,h
cp restab2 / 256 - 2
jr nz,makelp3
ld b,h
ld c,l
inc b
inc b
ld h,restab / 256 + 2
.makelp4
ld e,(hl)
inc h
ld d,(hl)
dec h
ex de,hl
add hl,hl:add hl,hl:add hl,hl:add hl,hl
ex de,hl
ld a,e
ld (bc),a
inc b
ld a,d
ld (bc),a
dec b
inc l
inc c
jr nz,makelp4
inc h
inc h
inc b
inc b
ld a,h
cp restab2 / 256
jr nz,makelp4
ret
ds -$ and #ff
.hltab1
ds 256
.hltab2
ds 256
.restab
ds 512 * 16
.restab2
ds 512 * 15
</pre>
The code to perform the multiplication (DE = L * C):
<pre>
ld h,hltab1 / 256 ; 2
ld b,(hl) ; 4
inc h ; 5
ld h,(hl) ; 7
ld l,c ; 8
ld a,(bc) ; 10
add (hl) ; 12
ld e,a ; 13
inc b ; 14
inc h ; 15
ld a,(bc) ; 17
adc (hl) ; 19
ld d,a ; 20
</pre>
16K is a lot of memory to use for tables, but I'm working on a way to reduce this to 8K while maintaining similar performance (around 26μs). The idea is rather than using tables for the low and high nibbles, use tables for alternate bits. So there would be 16 tables for the values #00, #01, #04, #05, #10, #11, #14, #15, #40, #41, #44, #45, #50, #51, #54, #55. The values can be shifted by 1 (rather than 4) to give values for the alternate bits using a simple ADD HL,HL.
[[User:Executioner|Executioner]] 03:58, 8 May 2007 (CEST)
== 16bit * 16bit Unsigned ==