The Boston Diaries

Wednesday, Debtember 06, 2023

Unit testing from inside an assembler, part III

I'm done with the “unit testing” backend for my 6809 assembler. The mini-Forth engine is working out fine, although the number of words increased from 41 to 47 to support some conveniences (like indexing and string comparison). It took some work to support, but the number of assertions one can make in the code is extensive. For example, a test case for this bit of code (which I do need to discuss, but that's a post for another time) looks like this:

test		sts	[$3333,x]
.next		pshs	pc,u,y,x,dp,b,a,cc

	.test	"STS"

		ldx	#.results
		ldy	#test
		jsr	init

	.assert	/x        = .results , "X=results"
	.assert	/y        = .next    , "Y=next"
	.assert	@@/0,x    = .address
	.assert	@@/2,x    = .opcode
	.assert	@@/4,x    = .operand
	.assert	@@/6,x    = .topcode
	.assert	@@/8,x    = .toperand
	.assert @.nowrite = $12         , "overwrite"
	.assert	@/-47,s   = $01		, "stack mod?"
	.assert	.address  = "0800"z     , "hex address"
	.assert	.opcode   = "10EF"z     , "hex opcode"
	.assert	.operand  = "993333"z   , "hex operand"
	.assert .topcode  = "STS"z      , "decoded opcode"
	.assert	.toperand = "[3333,X]"z , "decoded operand"

		rts

.results	fdb	.address
		fdb	.opcode
		fdb	.operand
		fdb	.topcode
		fdb	.toperand
.address	rmb	5
.opcode		rmb	5
.operand	rmb	7
.topcode	rmb	9
.toperand	rmb	19
.nowrite	nop

	.endtst

The code being tested is a 6809 disassembler written in 6809 assembly code (I wrote that a few years back—any testing now is academic at this point). The .TEST directive takes an optional string as the name of the test. If one isn't given, it will use the last non-local label seen in the source code as the name of the test. The first two lines:

	.assert	/x        = .results , "X=results"
	.assert	/y        = .next    , "Y=next"

assert that the X register points to .results and the Y register points to .next. I use the leading slash to denote a register instead of a label. One can use register names for labels and it's mostly unambiguous as the register is typically part of the mnemonic itself. The only exception is for the A, B and D registers, and then, only in the index addressing mode, as you can use the A, B or D register for an offset. But in the context of the .ASSERT directive it makes it easier to parse the intent if I use '/' to designate a register. Each register, and each bit in the condition code register (like /cc.z for the zero-flag) can be used. The bit after the comma, “X=results”, will be printed if the check fails:

test-disasm.asm:7: warning: W0015: STS:13 X=results: test failed:

(there can be text after the “test failed” bit, thus the colon).

The next few lines:

	.assert	@@/0,x    = .address
	.assert	@@/2,x    = .opcode
	.assert	@@/4,x    = .operand
	.assert	@@/6,x    = .topcode
	.assert	@@/8,x    = .toperand

assert the contents of memory pointed to by X. The double “@” fetches 16 bits from the address following, and in the first line, this is the address in the X register. The second line retrieves the 16 bits from the address two bytes past where the X register points to. You could write these lines as:

	.assert	@@(/x + 2) = .opcode

but a little syntactic sugar never hurts, and it mimics the native method of using the index registers. This was possibly the hardest bit of code to write, as the index addressing mode of the 6809, while great from an assembly programmer's perspective, is a nightmare from an assembler-implementer's perspective. Even here, where it's simplified, was a pain to get right, but I think it was worth it.

The next two lines:

	.assert @.nowrite = $12         , "overwrite"
	.assert	@/-47,s   = $01		, "stack mod?"

check that the given addresses, nowrite and a byte down in the system stack, contain certain 8-bit values. Each byte of the memory in the virtual 6809 system is filled with the value 1 (it can be changed on the command line), so here, each untouched byte will contain a 1. I picked that value since it's an illegal opcode, which the emulator will trap.

The final few lines:

	.assert	.address  = "0800"z     , "hex address"
	.assert	.opcode   = "10EF"z     , "hex opcode"
	.assert	.operand  = "993333"z   , "hex operand"
	.assert .topcode  = "STS"z      , "decoded opcode"
	.assert	.toperand = "[3333,X]"z , "decoded operand"

does indeed, do a string compare. And therein lies a tale. Again, this is a form of syntactic sugar:

	.assert	@.address=$30 && @(.address+1)=$38 && @(.address+2)=$30 && @(.address+3)=$30 && @(.address+4)=0

This was the second hardest bit to to support, is a bit fragile, and, if I'm honest, a hack. The string literal has to be on the right hand side of the conditional, and worse, there's no easy way to enforce this in the assembler (so I currently don't). Third, the second string has to be a literal string—you can't compare two different memory regions from the 6809 VM. There's also a limit of only one string literal per .ASSERT directive, again, because supporting more than one would vastly complicate the already somewhat complicated code (this “unit test“ backend is already 30% of the entire assembler).

To keep from having to add a ton of code for the conditional checks to support two different primitive types, or to keep from having to create a duplicate set of string conditionals, I cheated (or came up with a brilliant hack—take your pick). The code generated is:

VM_LIT .address
VM_SCMP
VM_EQ
VM_EXIT

That VM_SCMP is hiding things—it knows which string literal to use (as it's part of the VM program and there's only space for one string literal per .ASSERT directive) but it also leaves two values on the stack: -1,0 if the result is less than, 1,0 if the result is greater than, and 0,0 if the result is equal. This way, the conditional operators can work as is.

Oh, those “z”s on the end of each string literal? Well, the assembler supports several methods of storing string data in memory. There's the standard C NUL terminated strings; the OS-9 method of setting bit 7 of the last character of the string, and the sometimes used method where the first character of the string is actually the length. I originally had separate non-standard directives to support these methods, so when I wanted to support string-comparisons, I needed a way to support these methods. Then it hit me—the use of a suffix on the string—“Z” for the NUL terminated one (“Z” stands for “zero”), “H” for the bit 7 set (“H” for “high-bit”) and “C” for counted strings. And if I'm using the suffixes for the “unit test” backend, why not in general? So I replaced the .ASCIIZ and .ASCIIH directives (I was contemplating adding counted strings but I never got around to adding .ASCIIC) with just .ASCII and the use of a suffix (no suffix, string is left as-is).

So, back on track. The expressions can get quite involved. Some examples:

 .assert /b             = -(@lfsr & 1) & $B4
 .assert @tvalue        = $10*3+(1<<3)+2*2+(7-5)+1
 .assert @@(tvalue + 1) = $10+3+1<<3+2*2+7-5+1

You are also not limited to using the .ASSERT, .TRON and .TROFF directives inside a .TEST directive. You can put them anywhere in the codebase, and if that code is executed as part of a “unit test”, they'll trigger (and if you aren't using the “unit test” backend, they're ignored outright).

There are other changes too—each backend will parse its own command line options, I added some new warnings (such as a waring for self-modifying code), and the memory of the virtual 6809 can have various protections (read-only, write-only, execute-only, trace) set from the command line for further testing.

Now I just need to update the README.txt file and release the code.

Wednesday, Debtember 06, 2023

Unit testing from inside an assembler, part III

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer