This article describes an algorithm which precalculates the square from 0*0 to 255*255 and the routine to get the correct square value from the table.
Initialisation
Precalculate the square table.
INITSQ LD DE, 1 ;1st odd number LD HL, 0 ;HL = 1st square number LD B, H ;counter = 256 LD IX, SQTAB ;startaddress of the square table SQLOOP LD (IX), L ;Lowbyte to table INC IX LD (IX), H ;Highbyte to table INC IX ADD HL, DE ;add odd number INC DE ;next odd number INC DE DJNZ SQLOOP ;256 times RET
Get square from the table
Input: A = Factor
Output: DE = A*A
GETSQ LD L, A LD H, 0 ;HL = factor ADD HL, HL ;* 2 LD DE, SQTAB ;+ startaddress of the table ADD HL, DE ;= tableaddress LD E, (HL) ;E = Lowbyte of the result INC HL LD D, (HL) ;D = Highbyte of the result RET
Table definition
SQTAB DS 512 ;space for the table
Performance Improvements
This routine can be improved somewhat by using a page-aligned table with separate pages for LSB and MSB of the result. In order to do this, the initialisation routine needs to be:
INITSQ LD DE, 1 ;1st odd number LD HL, 0 ;HL = 1st square number LD B, H ;counter = 256 LD IX, SQTAB ;startaddress of the square table SQLOOP LD (IX + 0), L ;Lowbyte to table INC HX LD (IX + 0), H ;Highbyte to table DEC HX INC LX ADD HL, DE ;add odd number INC DE ;next odd number INC DE DJNZ SQLOOP ;256 times RET
And the routine to get the square can now be:
GETSQ LD L, A LD H, SQTAB / 256 ;HL = pointer to LSB LD E, (HL) ;E = Lowbyte of the result INC H LD D, (HL) ;D = Highbyte of the result RET
This is now a very small routine which could easily be defined as a macro for optimal performance (without the RET).