The Boston Diaries

Friday, March 01, 2024

The speed of Microsoft's BASIC floating point routines

I was curious about how fast Microsoft's BASIC floating point routines were. This is easy enough to test, now that I can time assembly code inside the assembler. The code calculates -2π³/3! using Color BASIC routines, IEEE-754 single precision and double precision.

First, Color BASIC:

	.tron	timing
ms_fp		ldx	#.tau
		jsr	CB.FP0fx	; FP0 = .tau
		ldx	#.tau
		jsr	CB.FMULx	; FP0 = FP0 * .tau
		ldx	#.tau
		jsr	CB.FMULx	; FP0 = FP0 * .tau
		jsr	CB.FP1f0	; FP1 = FP0
		ldx	#.fact3
		jsr	CB.FP0fx	; FP0 = 3!
		jsr	CB.FDIV		; FP0 = FP1 / FP0
		neg	CB.fp0sgn	; FP0 = -FP0
		ldx	#.answer
		jsr	CB.xfFP0	; .answer = FP0
	.troff
		rts

.tau		fcb	$83,$49,$0F,$DA,$A2 
.fact3		fcb	$83,$40,$00,$00,$00  
.answer		rmb	5
		fcb	$86,$A5,$5D,$E7,$30	; precalculated result

I can't use the .FLOAT directive here since that only supports either the Microsoft format or IEEE-754 but not both. So for this test, I have to define the individual bytes per float. The last line is what the result should be (by checking a memory dump of the VM after running). Also, .tao is 2π, just in case that wasn't clear. This ran in 8,742 cycles, taking 2,124 instructions and 4.12 cycles per instruction (I modified the assembler to record this additional information).

Next up, IEEE-754 single precision:

	.tron	timing
ieee_single	ldu	#.tau
		ldy	#.tau
		ldx	#.answer
		ldd	#.fpcb
		jsr	REG
		fcb	FMUL	; .answer = .tau * .tau

		ldu	#.tau
		ldy	#.answer
		ldx	#.answer
		ldd	#.fpcb
		jsr	REG
		fcb	FMUL	; .answer = .answer * .tau

		ldu	#.answer
		ldy	#.fact3
		ldx	#.answer
		ldd	#.fpcb
		jsr	REG
		fcb	FDIV	; .answer = .answer / 3!

		ldy	#.answer
		ldx	#.answer
		ldd	#.fpcb
		jsr	REG
		fcb	FNEG	; .answer = -.answer
	.troff
		rts

.fpcb		fcb	FPCTL.single | FPCTL.rn | FPCTL.proj
		fcb	0
		fcb	0
		fcb	0
		fdb	0

.tau		.float	6.283185307
.fact3		.float	3!
.answer		.float	0
		.float	-(6.283185307 ** 3 / 3!)

The floating point control block (.fpcb) configures the MC6839 to use single precision, normal rounding and projective closure (not sure what that is, but it's the default value). And it does calculate the correct result. It's amazing that code written 42 years ago for an 8-bit CPU works flawlessly. What it isn't is fast. This code took 14,204 cycles over 2,932 instructions (average 4.84 cycles per instruction).

The higher than average cycle type could be due to position independent addressing modes, but I'm not entirely sure what it's doing to take nearly twice the time. The ROM does use the IEEE-754 extended format (10 bytes) internally, with more bit shifts to extract the exponent and mantissa, but twice the time?

Perhaps it's code to deal with ±∞ and NaNs.

The IEEE-754 double precision is the same, except for the floating point control block configuring double precision and the use of .FLOATD instead of .FLOAT; otherwise the code is identical. The result, however, isn't. It took 31,613 cycles over 6,865 instructions (average 4.60 cycles per instruction). And being twice the size, it took nearly twice the time as single precision, which is expected.

The final bit of code just loads the ROMs into memory, and calls each function to get the timing:

		org	$2000
		incbin	"mc6839.rom"
REG		equ	$203D	; register-based entry point

		org	$A000
		incbin	"bas12.rom"

	.opt	test	prot	rw,$00,$FF	; Direct Page for BASIC
	.opt	test	prot	rx,$2000,$2000+8192 ; MC6839 ROM
	.opt	test	prot	rx,$A000,$A000+8192 ; BASIC ROM

	.test	"BASIC"
		lbsr	ms_fp
		rts
	.endtst

	.test	"IEEE-SINGLE"
		lbsr	ieee_single
		rts
	.endtst

	.test	"IEEE-DOUBLE"
		lbsr	ieee_double
		rts
	.endtst

Really, the only surprising thing here was just how fast Microsoft BASIC was at floating point.

Friday, March 01, 2024

The speed of Microsoft's BASIC floating point routines

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer