Wednesday, Debtember 06, 2023
Unit testing from inside an assembler, part III
I'm done with the “unit testing” backend for my 6809 assembler. The mini-Forth engine is working out fine, although the number of words increased from 41 to 47 to support some conveniences (like indexing and string comparison). It took some work to support, but the number of assertions one can make in the code is extensive. For example, a test case for this bit of code (which I do need to discuss, but that's a post for another time) looks like this:
test sts [$3333,x] .next pshs pc,u,y,x,dp,b,a,cc .test "STS" ldx #.results ldy #test jsr init .assert /x = .results , "X=results" .assert /y = .next , "Y=next" .assert @@/0,x = .address .assert @@/2,x = .opcode .assert @@/4,x = .operand .assert @@/6,x = .topcode .assert @@/8,x = .toperand .assert @.nowrite = $12 , "overwrite" .assert @/-47,s = $01 , "stack mod?" .assert .address = "0800"z , "hex address" .assert .opcode = "10EF"z , "hex opcode" .assert .operand = "993333"z , "hex operand" .assert .topcode = "STS"z , "decoded opcode" .assert .toperand = "[3333,X]"z , "decoded operand" rts .results fdb .address fdb .opcode fdb .operand fdb .topcode fdb .toperand .address rmb 5 .opcode rmb 5 .operand rmb 7 .topcode rmb 9 .toperand rmb 19 .nowrite nop .endtst
The code being tested is a 6809 disassembler written in 6809 assembly code
(I wrote that a few years back—any testing now is academic at this point).
The .TEST
directive takes an optional string as the name of the test.
If one isn't given,
it will use the last non-local label seen in the source code as the name of the test.
The first two lines:
.assert /x = .results , "X=results" .assert /y = .next , "Y=next"
assert that the X register points to .results
and the Y register points to .next
.
I use the leading slash to denote a register instead of a label.
One can use register names for labels and it's mostly unambiguous as the register is typically part of the mnemonic itself.
The only exception is for the A, B and D registers,
and then,
only in the index addressing mode,
as you can use the A, B or D register for an offset.
But in the context of the .ASSERT
directive it makes it easier to parse the intent if I use '/' to designate a register.
Each register,
and each bit in the condition code register
(like /cc.z
for the zero-flag)
can be used.
The bit after the comma,
“X=results”,
will be printed if the check fails:
test-disasm.asm:7: warning: W0015: STS:13 X=results: test failed:
(there can be text after the “test failed” bit, thus the colon).
The next few lines:
.assert @@/0,x = .address .assert @@/2,x = .opcode .assert @@/4,x = .operand .assert @@/6,x = .topcode .assert @@/8,x = .toperand
assert the contents of memory pointed to by X. The double “@” fetches 16 bits from the address following, and in the first line, this is the address in the X register. The second line retrieves the 16 bits from the address two bytes past where the X register points to. You could write these lines as:
.assert @@(/x + 2) = .opcode
but a little syntactic sugar never hurts, and it mimics the native method of using the index registers. This was possibly the hardest bit of code to write, as the index addressing mode of the 6809, while great from an assembly programmer's perspective, is a nightmare from an assembler-implementer's perspective. Even here, where it's simplified, was a pain to get right, but I think it was worth it.
The next two lines:
.assert @.nowrite = $12 , "overwrite" .assert @/-47,s = $01 , "stack mod?"
check that the given addresses,
nowrite
and a byte down in the system stack,
contain certain 8-bit values.
Each byte of the memory in the virtual 6809 system is filled with the value 1
(it can be changed on the command line),
so here,
each untouched byte will contain a 1.
I picked that value since it's an illegal opcode,
which the emulator will trap.
The final few lines:
.assert .address = "0800"z , "hex address" .assert .opcode = "10EF"z , "hex opcode" .assert .operand = "993333"z , "hex operand" .assert .topcode = "STS"z , "decoded opcode" .assert .toperand = "[3333,X]"z , "decoded operand"
does indeed, do a string compare. And therein lies a tale. Again, this is a form of syntactic sugar:
.assert @.address=$30 && @(.address+1)=$38 && @(.address+2)=$30 && @(.address+3)=$30 && @(.address+4)=0
This was the second hardest bit to to support,
is a bit fragile,
and,
if I'm honest,
a hack.
The string literal has to be on the right hand side of the conditional,
and worse,
there's no easy way to enforce this in the assembler
(so I currently don't).
Third,
the second string has to be a literal string—you can't compare two different memory regions from the 6809 VM.
There's also a limit of only one string literal per .ASSERT
directive,
again,
because supporting more than one would vastly complicate the already somewhat complicated code
(this “unit test“ backend is already 30% of the entire assembler).
To keep from having to add a ton of code for the conditional checks to support two different primitive types, or to keep from having to create a duplicate set of string conditionals, I cheated (or came up with a brilliant hack—take your pick). The code generated is:
VM_LIT .address VM_SCMP VM_EQ VM_EXIT
That VM_SCMP
is hiding things—it knows which string literal to use
(as it's part of the VM program and there's only space for one string literal per .ASSERT
directive)
but it also leaves two values on the stack: -1,0 if the result is less than,
1,0 if the result is greater than, and 0,0 if the result is equal.
This way, the conditional operators can work as is.
Oh,
those “z”s on the end of each string literal?
Well,
the assembler supports several methods of storing string data in memory.
There's the standard C NUL terminated strings;
the OS-9 method of setting bit 7 of the last character of the string,
and the sometimes used method where the first character of the string is actually the length.
I originally had separate non-standard directives to support these methods,
so when I wanted to support string-comparisons,
I needed a way to support these methods.
Then it hit me—the use of a suffix on the string—“Z” for the NUL terminated one (“Z” stands for “zero”),
“H” for the bit 7 set (“H” for “high-bit”) and “C” for counted strings.
And if I'm using the suffixes for the “unit test” backend,
why not in general?
So I replaced the .ASCIIZ
and .ASCIIH
directives
(I was contemplating adding counted strings but I never got around to adding .ASCIIC
)
with just .ASCII
and the use of a suffix
(no suffix, string is left as-is).
So, back on track. The expressions can get quite involved. Some examples:
.assert /b = -(@lfsr & 1) & $B4 .assert @tvalue = $10*3+(1<<3)+2*2+(7-5)+1 .assert @@(tvalue + 1) = $10+3+1<<3+2*2+7-5+1
You are also not limited to using the .ASSERT
,
.TRON
and .TROFF
directives inside a .TEST
directive.
You can put them anywhere in the codebase,
and if that code is executed as part of a “unit test”,
they'll trigger
(and if you aren't using the “unit test” backend,
they're ignored outright).
There are other changes too—each backend will parse its own command line options, I added some new warnings (such as a waring for self-modifying code), and the memory of the virtual 6809 can have various protections (read-only, write-only, execute-only, trace) set from the command line for further testing.
Now I just need to update the README.txt
file and release the code.