The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Friday, March 01, 2024

The speed of Microsoft's BASIC floating point routines

I was curious about how fast Microsoft's BASIC floating point routines were. This is easy enough to test, now that I can time assembly code inside the assembler. The code calculates -2π3/3! using Color BASIC routines, IEEE-754 single precision and double precision.

First, Color BASIC:

	.tron	timing
ms_fp		ldx	#.tau
		jsr	CB.FP0fx	; FP0 = .tau
		ldx	#.tau
		jsr	CB.FMULx	; FP0 = FP0 * .tau
		ldx	#.tau
		jsr	CB.FMULx	; FP0 = FP0 * .tau
		jsr	CB.FP1f0	; FP1 = FP0
		ldx	#.fact3
		jsr	CB.FP0fx	; FP0 = 3!
		jsr	CB.FDIV		; FP0 = FP1 / FP0
		neg	CB.fp0sgn	; FP0 = -FP0
		ldx	#.answer
		jsr	CB.xfFP0	; .answer = FP0
	.troff
		rts

.tau		fcb	$83,$49,$0F,$DA,$A2 
.fact3		fcb	$83,$40,$00,$00,$00  
.answer		rmb	5
		fcb	$86,$A5,$5D,$E7,$30	; precalculated result

I can't use the .FLOAT directive here since that only supports either the Microsoft format or IEEE-754 but not both. So for this test, I have to define the individual bytes per float. The last line is what the result should be (by checking a memory dump of the VM after running). Also, .tao is , just in case that wasn't clear. This ran in 8,742 cycles, taking 2,124 instructions and 4.12 cycles per instruction (I modified the assembler to record this additional information).

Next up, IEEE-754 single precision:

	.tron	timing
ieee_single	ldu	#.tau
		ldy	#.tau
		ldx	#.answer
		ldd	#.fpcb
		jsr	REG
		fcb	FMUL	; .answer = .tau * .tau

		ldu	#.tau
		ldy	#.answer
		ldx	#.answer
		ldd	#.fpcb
		jsr	REG
		fcb	FMUL	; .answer = .answer * .tau

		ldu	#.answer
		ldy	#.fact3
		ldx	#.answer
		ldd	#.fpcb
		jsr	REG
		fcb	FDIV	; .answer = .answer / 3!

		ldy	#.answer
		ldx	#.answer
		ldd	#.fpcb
		jsr	REG
		fcb	FNEG	; .answer = -.answer
	.troff
		rts

.fpcb		fcb	FPCTL.single | FPCTL.rn | FPCTL.proj
		fcb	0
		fcb	0
		fcb	0
		fdb	0

.tau		.float	6.283185307
.fact3		.float	3!
.answer		.float	0
		.float	-(6.283185307 ** 3 / 3!)

The floating point control block (.fpcb) configures the MC6839 to use single precision, normal rounding and projective closure (not sure what that is, but it's the default value). And it does calculate the correct result. It's amazing that code written 42 years ago for an 8-bit CPU works flawlessly. What is isn't is fast. This code took 14,204 cycles over 2,932 instructions (average 4.84 cycles per instruction).

The higher than average cycle type could be due to position independent addressing modes, but I'm not entirely sure what it's doing to take nearly twice the time. The ROM does use the IEEE-754 extended format (10 bytes) internally, with more bit shifts to extract the exponent and mantissa, but twice the time?

Perhaps it's code to deal with ±∞ and NaNs.

The IEEE-754 double precision is the same, except for the floating point control block configuring double precision and the use of .FLOATD instead of .FLOAT; otherwise the code is identical. The result, however, isn't. It took 31,613 cycles over 6,865 instructions (average 4.60 cycles per instruction). And being twice the size, it took nearly twice the time as single precision, which is expected.

The final bit of code just loads the ROMs into memory, and calls each function to get the timing:

		org	$2000
		incbin	"mc6839.rom"
REG		equ	$203D	; register-based entry point

		org	$A000
		incbin	"bas12.rom"

	.opt	test	prot	rw,$00,$FF	; Direct Page for BASIC
	.opt	test	prot	rx,$2000,$2000+8192 ; MC6839 ROM
	.opt	test	prot	rx,$A000,$A000+8192 ; BASIC ROM

	.test	"BASIC"
		lbsr	ms_fp
		rts
	.endtst

	.test	"IEEE-SINGLE"
		lbsr	ieee_single
		rts
	.endtst

	.test	"IEEE-DOUBLE"
		lbsr	ieee_double
		rts
	.endtst

Really, the only surprising thing here was just how fast Microsoft BASIC was at floating point.

Wednesday, February 28, 2024

Converting IEEE-754 floating point to Color BASIC floating point

I'm still playing around with floating point on the 6809—specifically, support for floating point for the Color Computer. The format for floating point for Color BASIC (written by Microsoft) predates the IEEE-754 Floating Point Standard by a few years and thus, isn't quite compatible. It's close, though. It's defined as an 8-bit exponent, biased by 129, a single sign bit (after the exponent) and 31 bits for the mantissa (the leading one assumed). It also does not support ±∞ nor NaN. This differs from the IEEE-754 single precision that uses a single sign bit, an 8-bit exponent biased by 127 and 23 bits for the mantissa (which also assumes a leafing one) and support for infinities and NaN. The IEEE-754 double precision uses a single sign bit, an 11-bit exponent biased by 1023 and 52 bit for the mantissa (leading one assumed) plus support for infinities and NaN.

So the Color BASIC is about halfway between single precision and double precision. This lead me to use IEEE-754 double precision for the Color Computer backend (generating an error for inifinities and NaN) then massaging the resulting double into the proper format. I double checked this by finding some floating point constants in the Color BASIC ROM as shown in the book Color BASIC Unravelled II, (available on the Computer Computer Archives), like this table:

4634				* MODIFIED TAYLOR SERIES SIN COEFFICIENTS
4635	BFC7 05			LBFC7	FCB	6-1			SIX COEFFICIENTS
4636	BFC8 84 E6 1A 2D 1B	LBFC8	FCB	$84,$E6,$1A,$2D,$1B	* -((2*PI)**11)/11!
4637	BFCD 85 28 07 FB F8	LBFCD	FCB	$86,$28,$07,$FB,$F8	*  ((2*PI)**9)/9!
4638	BFD2 87 99 68 89 01	LBFD2	FCB	$87,$99,$68,$89,$01	* -((2*PI)**7)/7!
4639	BFD7 87 23 35 DF E1	LBFD7	FCB	$87,$23,$35,$DF,$E1	*  ((2*PI)**5)/5!
4640	BFDC 86 A5 5D E7 28	LBFDC	FCB	$86,$A5,$5D,$E7,$28	* -((2*PI)**3)/3!
4641	BFE1 83 49 0F DA A2	LBFE1	FCB	$83,$49,$0F,$DA,$A2	*    2*PI

Then using the byte values to populate a variable and printing it inside BASIC (this is the expression -2π3/3!):

X=0         ' CREATE A VARIABLE
Y=VARPTR(X) ' GET ITS ADDRESS
POKE Y,&H86 ' AND SET ITS VALUE
POKE Y+1,&HA5 ' THE HARD WAY
POKE Y+2,&H5D
POKE Y+3,&HE7
POKE Y+4,&H28
PRINT X ' LET'S SEE WHAT IT IS
-41.3417023

Then using that to create a floating point value:

	org	$1000
	.float	-41.3417023
	end

Checking the resulting bytes that were generated:

                         | FILE ff.a
                       1 |         org     $1000
1000: 86A55DE735       2 |         .float  -41.3417023
                       3 |         end

And adjusting the floating point constant until I got bytes that matched:

                         | FILE ff.a
                       1 |         org     $1000
1000: 86A55DE728       2 |         .float  -41.341702110
                       3 |         end

I figure it's “close enough.” The parsing code in the Color BASIC ROM is old and predates the IEEE-754 floating point standard, so a few different digits at the end I think is okay.

As a final check, I wrote the following bit of code to calculate and display -2π3/3!, display the pre-calculated result, as well as display the pre-calculated value of 2π:

		include	"Coco/basic.i"
		include	"Coco/dp.i"

CB.FSUBx	equ	$B9B9	; FP0 = X   - FP0	; addresses for
CB.FSUB		equ	$B9BC	; FP0 = FP1 - FP0	; these routines 
CB.FADDx	equ	$B9C2	; FP0 = X   + FP0	; from
CB.FADD		equ	$B9C5	; FP0 = FP1 + FP1	; Color BASIC Unravelled II
CB.FMULx	equ	$BACA	; FP0 = X   * FP0
CB.FMUL		equ	$BAD0	; FP0 = FP0 * FP1
CB.FDIVx	equ	$BB8F	; FP0 = X   / FP0
CB.FDIV		equ	$BB91	; FP0 = FP1 / FP0

CB.FP0fx	equ	$BC14	; FP0 = X
CB.xfFP0	equ	$BC35	; X   = FP0
CB.FP1f0	equ	$BC5F	; FP1 = FP0
CB.FP0txt	equ	$BDD9	; result in X, NUL terminated

		org	$4000
start		ldx	#tau		; point to 2*pi
		jsr	CB.FP0fx	; copy to FP0
		ldx	#tau		; 2PI * 2PI
		jsr	CB.FMULx
		ldx	#tau		; 2PI * 2PI * 2PI
		jsr	CB.FMULx
		jsr	CB.FP1f0	; copy fp acc to FP1
		ldx	#fact3		; point to 3!
		jsr	CB.FP0fx	; copy to FP0
		jsr	CB.FDIV		; FP0 = FP1 / FP0
		neg	CB.fp0sgn	; negate result by flippping FP0 sign
		jsr	CB.FP0txt	; generate string
		bsr	display		; display on screen

		ldx	#answer		; point to precalculated result
		jsr	CB.FP0fx	; copy to FP0
		jsr	CB.FP0txt	; generate string
		bsr	display		; display

		ldx	#tau		; now display 2*pi
		jsr	CB.FP0fx	; just to see how close
		jsr	CB.FP0txt	; it is.
		bsr	display
		rts

display.char	jsr	[CHROUT]	; display character
display		lda	,x+		; get character
		bne	.char		; if not NUL byte, display
		lda	#13		; go to next line
		jsr	[CHROUT]
		rts

tau		.float	6.283185307
fact3		.float	3!
answer		.float	-(6.283185307 ** 3 / 3!)

		end	start

The results were:

-41.3417023
-41.3417023
 6.23418531

The calculation results in -41.3417023 and the direct result stored in answer also prints out -41.3417023, so that matches and it reinforces my approach to this nominally right.

But I think Microsoft had issues with either generating some of the floating point constants for the larger terms, or transcribing the byte values of the larger terms. Take for instance -2π11/11!. The correct answer is -15.0946426, but the bytes in the ROM define the constant -14.3813907, a difference of .7. And it's not like Color BASIC can't calculate that correctly—when I typed in the expression by hand, it was able to come up with -15.0946426.

Or it could be that Walter K. Zydhek, the author of Color BASIC Unravelled II, is wrong in his interpretation of the expressions used to generate the values, or his interpretation of what the values are used for. I'm not sure who is at fault here.

Update on Friday, March 1st, 2024

I was wrong about the authorship of Color BASIC Unravelled II. It was not Walter K. Zydhek, but some unknown author of Spectral Associates, a company that is no longer in business. All Zydhek did was to transcribe a physical copy of the book (which is no longer available for purchase anywhere) into a PDF and make it available.


Discussions about this entry

Wednesday, February 14, 2024

Notes from an overheard conversation from a car attempting a right turn

“Oh! Now what?”

“Sir, see the lit sign up there? You cannot turn right.”

“But did you not see the car right in front of me turning right?”

“Sir, if a person jumped off a bridge, would you follow?”

“Yes.”

“You must be very smart then.”

“And selective enforcement of the laws leads to distrust of the police.”

“Sir—”

“Oh look! The ‘No Right Turn’ sign is off now! Gotta go! Bye!”

“I don't think it's wise to taunt the poice like that.”

“Down with the Man! Power to the people! Yo!”

Sunday, February 11, 2024

An extensible programming language

A few days ago I wrote about adding a factorial operator to my assembler, and I noted that I knew not of any other languages that had such a feature. So imagine my surprise as I'm reading about XL (via Lobsters) and the second example is factorial! Not only that, but that was an example of extending the language itself! The last time I was this excited about software was reading about Synthesis OS, a JIT-based operating system where you could create your own system calls.

How it handles precedence is interesting. In my assembler, I have left or right associativity as an explicit field, whereas in XL, it's encoded in the precedence level itself—even if its left, odd if its right. I'm not sure how I feel about that. On the one hand it feels nice and it's one less field to carry around; on the other, being explicit as I did makes it clear if something is left or right. But on the gripping hand, it sounds like matching precedence on a left and right operator could lead to problems, so I still may have an explicitness problem.

But I digress.

It's a very simple language with only one keyword “is” and a user-definable precedence table. The parser generates a parse tree of only eight types, four leaf nodes (integer, real, text, name (or symbol)) and four non-leaf nodes (prefix, infix, postfix and block). And from there, you get XL.

This is something I definitely want to look into.

Wednesday, February 07, 2024

Instead of “write-only memory” assembly support, how about floating point support?

You might think it odd to add support for floating point constants for an 8-bit CPU, but Motorola did development on the MC6839 floating point firmware for the MC6809, an 8K ROM of thread-safe, position-independent 6809 code that implements the IEEE Standard for Floating-Point Arithmetic. It was never formally released by Motorola as a product, but from what I understand, it was released later under a public domain license. At the very least, it's quite easy to MC6839 find both the ROM image and the source code on the Intarwebs. So that's one reason.

Another reason is that the Color Computer BASIC supports floating point operations, and while not IEEE-754, as it was written before the IEEE-754 standard become a standard, it still floating point, and there are only minor differences between it and the current standard, namely the exponent bias, number of fractional bits supported, and where the sign bit is stored. It really comes down to some bit manipulations to massage a standard float into the Color Computer BASIC float format. There are some differences, but the differences are small (literally, on the scale of 0.0000003) probably due to parsing differences, and small enough that it should be “good enough.” Especially since the Color Computer BASIC float format doesn't support infinity or NaN.

So if you specify a backend other than the rsdos backend, you get IEEE-754, and if you do specify rsdos as a backend, you get the Color Computer BASIC float format.

And yes, I added support for floating point expressions (but not for the test backend—I'm still thinking on how to support it), and one interesting feature I added is the factorial operator “!”. Factorials are used in Talor series, which the Color Computer BASIC uses for the sin() function, so I can literally write:

	; Oh!  '**' is exponentiation by the way!
taylor_series	.float	-((2 * 3.14159265358979323846) ** 11) / 11!
		.float	 ((2 * 3.14159265358979323846) **  9) /  9!
		.float	-((2 * 3.14159265358979323846) **  7) /  7!
		.float	 ((2 * 3.14159265358979323846) **  5) /  5!
		.float	-((2 * 3.14159265358979323846) **  3) /  3!
		.float	   2 * 3.14159265358979323846

and have it generate the correct values. I personally don't know of any language that has a factorial operator (maybe APL? I don't know).

I think I'm having more fun writing the assembler than I am writing assembly code.

Tuesday, February 06, 2024

Okay! I'll answer your question, LinkedIn. Also, Orange Site! Orange Site! Orange Site!

Today's “you're one of the few experts invited to add this collaborative article” from LinkedIn is “How do you become a senior application developer?” My answer? Stay in a programming job for three years. Boom! You're a senior application developer. I know, I know, that's a big ask these days when everybody is jumping ship every two years. But hey, if you want to be a “senior application developer,” you got to make sacrifices.

Oh, and the title to today's post? I found out that LinkedIn really liked when I mentioned the Orange Site in the title of my post. Almost two orders of magnitude more. So I'm doing a test to see if I can game the system there.


So you want to amplify my SEO

From
Krystal XXXXX­XX <XXXXX­XXXXX­XXXXX­XXXXX@gmail.com>
To
sean@conman.org
Subject
Amplify Your SEO with Strategic Link Inserts
Date
Wed, 7 Feb 2024 01:16:35 +0300

Hi Content Team,

It’s Krystal XXXXX­XX here from Next Publisher, your next potential partner in digital storytelling. We're thrilled about the idea of featuring both guest posts and link insertions on your dynamic website.

We would like to know your fee structure for hosting guest posts and link insertions. Our team aims to create compelling content that is tailored to your site’s audience and enhances your overall content strategy.

A quick note: this initial email is only for starting our dialogue. All more detailed communications, including agreements and transactions, will be carried out through our official Next Publisher email.

We admire the quality of your platform and are excited to explore how we can work together for mutual benefit.

Looking forward to your prompt reply.

Warm wishes,

Krystal XXXXX­XX

Hello Krystal.

Since you neglected to include a link (in an email sent as HTML no less!) it's hard for me judge the value Next Publisher will provide for my site, so I'm going to have to adjust my prices accordingly. My fee for both guest posts and link insertions is $10,000 (US) per. So if a guest post also includes a link insertion, that would be a total of $20,000 (US). If I'm going to be whoring selling out what Google Page Rank I have, it's going to cost.

I look forward to hearing back from you.

Sean.

Monday, February 05, 2024

The difficulties in supporting “write-only memory” in assembly

When I last wrote about this, I had one outstanding problem with static analysis of read-only/write-only memory, and that was with hardware that could be input or output only. It was only after I wrote that that I realized the solution—it's the same as a hardware register having different semantics on read vs. write—just define two labels with the semantics I want. So for the MC6821, I could have:

		org	$FF00
PIA0.A		rmb/r	1	; read only
		org	$FF00
PIA0.Adir	rmb/w	1	; write only, to set the direction of each IO pin
PIA0.Acontrol	rmb	1	; control for port A

So that was a non-issue. It was then I started looking over some existing code I had to see how it might look. I didn't want to just jump into an implementation without some forethought, and I quickly found some issues with the idea by looking at my maze generation program. The code in question initializes the required video mode (in this case 64×64 with four colors). Step one involves writing a particular value to the MC6821:

		lda	#G1C.PIA ; 64x64x4
		sta	PIA1.B

So far so good. I can mark PIA1.B as write-only (technically, it also has some input pins so I really can't, but in theory I could).

Now, the next bit requires some explaining. There's another 3-bit value that needs to be configured on the MC6883, but it's not as simple as writing the 3-bit value to a hardware register—each bit requires writing to a different address, and worse—it's a different address if the bit is 0 or 1. So that's six different addresses required. It's not horrible though—the addresses are sequential:

6883 VDG Addressing Mode
bit 0/1 address
V0 0 $FFC0
V0 1 $FFC1
V1 0 $FFC2
V1 1 $FFC3
V2 0 $FFC4
V2 1 $FFC5

Yeah, to a software programmer, hardware can be weird. To set bit 0 to 0, you do a write (and it does not matter what the value is) to address $FFC0. If bit 0 is 1, then it's a write to $FFC1. So with that in mind, I have:

		sta	SAM.V0 + (G1C.V & 1<<0 <> 0)
		sta	SAM.V1 + (G1C.V & 1<<1 <> 0)
		sta	SAM.V2 + (G1C.V & 1<<2 <> 0)

OOh. Yeah.

I wrote it this way so I wouldn't have to look up the appropriate value and write the more opaque (to me):

		sta	$FFC1
		sta	SFFC2
		sta	$FFC4

The expression (G1C.V & 1<<n <> 0) checks bit n to see if it's set or not, and returns 0 (for not set) or 1 (for set). This is then added to the base address for bit n, and it all works out fine. I can change the code for, say, the 128×192 four color mode by using a different constant:

		lda	#G6C.PIA
		sta	PIA1.B
		sta	SAM.V0 + (G6C.V & 1<<0 <> 0)
		sta	SAM.V1 + (G6C.V & 1<<1 <> 0)
		sta	SAM.V2 + (G6C.V & 1<<2 <> 0)

But I digress.

This is a bit harder to support. The address being written is part of an expression, and only the label (defining the address) would have the read/write attribute associated with it. At least, that was my intent. I suppose I could track the read/write attribute by address, which would solve this particular segment of code.

And the final bit of code to set the address of the video screen (or frame buffer):

		ldx	#SAM.F6		; point to frame buffer address bits
		lda	ECB.grpram	; get MSB of frame buffer
mapframebuf	clrb
		lsla
		rolb
		sta	b,x		; next bit of address
		leax	-2,x
		cmpx	#SAM.F0
		bhs	mapframebuf

Like the VDG Address Mode bits, the bits for the VDG Address Offset have unique addresses, and because the VDG Address Offset has seven bits, the address is aligned to a 512 byte boundary. Here, the code loads the X register with the address of the upper end of the VDG Address Offset, and the seven top most bits of the video address is sent, one at a time, to the B register, which is used as an offset to the X register to set the appropriate address for the appropriate bit. So now I would have to track the read/write attributes via the index registers as well.

That is not so easy.

I mean, here, it could work, as the code is all in one place, but what if instead it was:

		ldx	#SAM.F6
		lda	ECB.grpram
		jsr	mapframebuf

Or an even worse example:

costmessage	fcc/r	"A constant message" ; read only text
buffer		rmb	18

		ldx	#constmessage
		ldy	#buffer
		lda	#18
		jsr	memcpy

The subroutine memcpy might not even be in the same source unit, so how would the read/write attribute even be checked? This is for static analysis, not runtime.

I have one variation on the maze generation program that generates multiple mazes at the same time, on the same screen (it's fun to watch) and as such, I have the data required for each “maze generator” stored in a structure:

explorec	equ	0	; read-only
backtrackc	equ	1	; read-only
xmin		equ	2	; read-only
ymin		equ	3	; read-only
xstart		equ	4	; read-only
ystart		equ	5	; read-only
xmax		equ	6	; read-only
ymax		equ	7	; read-only
xpos		equ	8	; read-write
ypos		equ	9	; read-write
color		equ	10	; read-write
func		equ	11	; read-write

This is from the source code, but I've commented each “field” as being “read-only” or “read-write.” That's another aspect of this that I didn't consider:

		lda	explorec,x	; this is okay
		sta	explorec,x	; this is NOT okay

Not only would I have to track read/write attributes for addresses, but for field accesses to a structure as well. I'm not saying this is impossible, it's just going to take way more thought than I thought. I don't think I'll have this feature done any time soon …

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.