Sunday, October 13, 2024
A benchmark of three different floating point packages for the 6809
I recently came across another floating point package for the 6809
(written by Lennart Benschop)
and I wanted to see how it stacked up against IEEE-754 and BASIC floating point math.
To do this,
I wanted to add support to my 6809 assembler,
but it required some work.
There was no support to switch floating point formats—if you picked the rsdos output format,
you got the Microsoft floating point,
and for the other output formats,
you got IEEE-754 support.
The other issue, the format used by the new floating point package I found is ever-so-slightly different from the Microsoft format. It's just a single bit difference—Microsoft uses an exponential bias of 129, whereas this package uses a bias of 128 (and why do floating point packages use an exponential bias? I'm not entirely sure why). But other than this one small difference, they are basially the same.
It turned out not to be that hard to support all three floating point formats.
The output formats still select a default format like before,
but now,
you can use the .OPT directive to select the floating point formats:
.opt * real ieee .float 3.14159265358979323846 .opt * real msfp .float 3.14159265358979323846 .opt * real lbfp .float 3.14159265358979323846
And you get three different formats as output:
| FILE p.asm
1 | .opt * real ieee
0000: 40490FDB 2 | .float 3.14159265358979323846
3 | .opt * real msfp
0004: 82490FDAA2 4 | .float 3.14159265358979323846
5 | .opt * real lbfp
0009: 81490FDAA2 6 | .float 3.14159265358979323846
I added some code to my floating point benchmark program, which now uses all three formats to calculate -2π3/3! and times each one. The new code:
.opt * real lbfp
.tron timing
lb_fp ldu #lb_fp.fpstack
ldx #.tau
jsr fplod ; t0 = .tau
ldx #.tau
jsr fplod ; t1 = .tau
jsr fpmul ; t2 = .t0 * t1
ldx #.tau
jsr fplod ; t3 = .tau
jsr fpmul ; t4 = .t2 * t3
ldx #.fact3
jsr fplod
jsr fpdiv
jsr fpneg
ldx #.answer
jsr fpsto
.troff
rts
.tau .float 6.283185307
.fact3 .float 3!
.answer .float 0
.float -(6.283185307 ** 3 / 3!)
.fpstack rmb 4 * 10
The results are interesting (the IEEE-754 results are from the same package which support both single and double formats):
| format | cycles | instructions |
|---|---|---|
| Microsoft | 8752 | 2124 |
| Lennart | 7465 | 1326 |
| IEEE-754 single | 14204 | 2932 |
| IEEE-754 double | 31613 | 6865 |
The new code is the fastest so far. I think the reason it's faster than Microsoft's is (I think) because Microsoft uses a single codebase for all their various BASIC interpreters, so it's not really “written in 6809 assembler” as much as it is “written in 8080 assembler and semi-automatically converted to 6809 assembly,” which explains why Microsoft BASIC was so ubiquitous for 80s machines.
It's also smaller than the IEEE-754 package, a bit over 2K vs. the 8K for the IEEE-754 package. It's hard to tell how much bigger it is than Microsoft's, because Microsoft's is buried inside a BASIC interpreter, but it wouldn't surprise me it's smaller given the number of instructions executed.
Discussions about this entry
- Two Stop Bits | A benchmark of three different floating point packages for the 6809
- A benchmark of three different floating point packages for the 6809 | Hacker News
- A benchmark of three different floating point packages for the 6809 - Lemmy: Bestiverse
Unit testing from inside an assembler, part IV
I'm not terribly happy with how running unit tests inside my assembler work. I mean, it works, as in, it tests the code and show problems during the assembly phase, but I don't like how you write the tests in the first place. Here's one of the tests I added to my maze generation program (and the routine it tests):
getpixel bsr point_addr ; get video address
comb ; reverse mask (since we're reading
stb ,-s ; the screen, not writing it)
ldb ,x ; get video data
andb ,s+ ; mask off the pixel
tsta ; any shift?
beq .done
.rotate lsrb ; shift color bits
deca
bne .rotate
.done rts ; return color in B
.test
.opt test pokew ECB.beggrp , $0E00
.opt test poke $0E00 , %11_11_11_11
lda #0
ldb #0
bsr getpixel
.assert /d = 3
.assert /x = @@ECB.beggrp
lda #1
ldb #0
bsr getpixel
.assert /d = 3
.assert /x = @@ECB.beggrp
lda #2
ldb #0
bsr getpixel
.assert /d = 3
.assert /x = @@ECB.beggrp
lda #3
ldb #0
bsr getpixel
.assert /d = 3
.assert /x = @@ECB.beggrp
rts
.endtst
The problem is the machine code for the test is included in the final binary output,
which is bad because I can't just set an option to run the tests in addition to assembling the code into its final output,
which I don't want
(and that means when I use the test backend,
I tend to generate the output to /dev/null).
I've also found that I prefer table-style tests to writing code
(for reasons way beyond the scope of this entry).
For example,
for a C function like this:
int max_monthday(int year,int month)
{
static int const days[] = { 31,0,31,30,31,30,31,31,30,31,30,31 } ;
assert(year > 1969);
assert(month > 0);
assert(month < 13);
if (month == 2)
{
/*----------------------------------------------------------------------
; in case you didn't know, leap years are those years that are divisible
; by 4, except if it's divisible by 100, then it's not, unless it's
; divisible by 400, then it is. 1800 and 1900 were NOT leap years, but
; 2000 is.
;----------------------------------------------------------------------*/
if ((year % 400) == 0) return 29;
if ((year % 100) == 0) return 28;
if ((year % 4) == 0) return 29;
return 28;
}
else
return days[month - 1];
}
I would prefer to write test code like:
| output | year | month |
|---|---|---|
| 28 | 1900 | 2 |
| 29 | 2000 | 2 |
| 28 | 2100 | 2 |
| 29 | 1904 | 2 |
| 29 | 2104 | 2 |
| 28 | 2001 | 2 |
Just specify the inputs and outputs for some corner cases, and let the computer do what is necessary to call the function in question.
But it's not so easy with assembly language,
given the large number of ways to pass data into a function,
and the number of output results one can have.
How would I specify that the inputs come in registers A and B,
and the outputs come in A, B and X?
The above could be done in a table format,
I guess.
It might not be pretty,
but it's doable.
Then there's these subroutines and their associated tests:
;***********************************************************************
; RND4 Generate a random number 0 .. 3
;Entry: none
;Exit: B - random number
;***********************************************************************
rnd4 dec rnd4.cnt ; any more cached random #s?
bpl .cached ; yes, get next cached number
ldb #3 ; else reset count
stb rnd4.cnt
bsr random ; get random number
stb rnd4.cache ; save in the cache
bra .ret ; and return the first number
.cached ldb rnd4.cache ; get cached value
lsrb ; get next 2-bit random number
lsrb
stb rnd4.cache ; save ermaining bits
.ret andb #3 ; mask off our result
rts
;***********************************************************************
; RANDOM Generate a random number
;Entry: none
;Exit: B - random number (1 - 255)
;***********************************************************************
random ldb lfsr
andb #1
negb
andb #$B4
stb ,-s ; lsb = -(lfsr & 1) & taps
ldb lfsr
lsrb ; lfsr >>= 1
eorb ,s+ ; lfsr ^= lsb
stb lfsr
rts
.test
ldx #.result_array
clra
clrb
.setmem sta ,x+
decb
bne .setmem
ldx #.result_array + 128
lda #1
sta lfsr
lda #255
.loop bsr random
.assert /b <> 0 , "degenerate LFSR"
.assert @/b,x = 0 , "non-repeating LFSR"
inc b,x
deca
bne .loop
clr ,x
clr 1,x
clr 2,x
clr 3,x
lda #255
.chk4 bsr rnd4
.assert /b >= 0
.assert /b <= 3
inc b,x
deca
bne .chk4
.tron
ldb ,x ; to check the spread
ldb 1,x ; of results, basically
ldb 2,x ; these should be roughly
ldb 3,x ; 1/4 of 256
.troff
.assert @/,x + @/1,x + @/2,x + @/3,x = 255
rts
.result_array rmb 256
.endtst
.test "whole program"
.opt test pokew $A000 , KEYIN
.opt test pokew $FFFE , END
.opt test prot r,$A000,$A001
lbsr start
KEYIN lda #'Q'
END rts
.endtst
And … just uhg.
I mean,
this checks that the 8-bit LFSR I'm using to generate random numbers actually doesn't repeat within it's 255-period cycle,
and that the number of 2-bit random numbers I generate from RND4 is more or less evenly spread,
and for both of those,
I use an array to store the intermediate results.
I leary about including an interpreter just for the tests,
because I don't think it would be any better.
At least the test code is largely written in the target language of 6809 assembly.
Then again, I could embed Lua, and write the tests like:
.test
local array = {}
for i = 0 , 255 do array[i] = 0 end
mem['lfsr'] = 1
for i = 0 , 255 do
call 'random'
assert(cpu.B ~= 0)
assert(array[cpu.B] == 0)
array[cpu.B] = 1
end
array[0] = 0
array[1] = 0
array[2] = 0
array[3] = 0
for i = 0 , 255 do
call 'rnd4'
assert(cpu.B >= 0)
assert(cpu.B <= 3)
array[cpu.B] = array[cpu.B] + 1
end
assert(array[0] + array[1] + array[2] + array[3] == 255)
.endtst
I suppose?
I would still need to somehow code the fake KEYIN and END routines required for the test.
And the first test at the start of this post would then look like:
.test memw['ECB.beggrp'] = 0x0E00 mem[0x0E00] = '%11_11_11_11' cpu.A = 0 cpu.B = 0 call 'getpixel' assert(cpu.D == 3) assert(cpu.X == memw['ECB.beggrp']) cpu.A = 1 cpu.B = 0 call 'getpixel' assert(cpu.D == 3) assert(cpu.X == memw['ECB.beggrp']) cpu.A = 2 cpu.B = 0 call 'getpixel' assert(cpu.D == 3) assert(cpu.X == memw['ECB.beggrp']) cpu.A = 3 cpu.B = 0 call 'getpixel' assert(cpu.D == 3) assert(cpu.X == memw['ECB.beggrp']) .endtst
which isn't any longer than the original test, but still … uhg. But doing this means I won't have 6809 code for testing in the final output, which means I could run tests with any backend.
I'll have to think on this.
![Glasses. Titanium, not steel. [Self-portrait with my new glasses]](https://www.conman.org/people/spc/about/2025/0925.t.jpg)