The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, Debtember 03, 2025

♫I can't dance, I can't relate, only thing about me is the way I confabulate!♫

Mark and I were talking about Roko's dancing basilisk and he suggested I feed it my 6809 ANS Forth implementation, on the assumption that no one has fed an assembly-based project through it. Before feeding it that one though, I decided to try a simpler program, my 6809 disassembler written in 6809 assembly code and … well … I'm not sure if anything is terribly wrong with it because it's just the source code in prose, and repetative prose at that.

But it's a single 1,200 line file—way smaller than mod_blog and a09. Makes sense that it's probably okay, if boringly repetitious.

I then ran it over my ANS Forth implementation and … wow … it can't count to save its life. None of the line counts are accurate in that table, and the line count for the source file is way out of line. No, the source file isn't 30,000+ lines of code! Yes, it's sizable, at 12,000 lines, but it's not 30,000+. But it does lead me to believe that any project that is close to, or over, 10,000 lines of code will have errors in the generated “documentation.” The other sizable error I found before falling asleep is the main runtime routine isn't DOCOL (which is a name used by a lot of 8-bit Forth implementations) but forth_core_colon.runtime. If there are other errors, I don't know. I couldn't make myself to through more of its “documentation.”

Ugh.


A decision on semantic versioning

I finally decided on version 8.0.8 for CGILib. The main rational—I think I'm the only one using this, and to me, this is a bug fix, and it doesn't change the intended API at all. So version 8.0.8 it is!

Tuesday, Debtember 02, 2025

Roko's dancing basilisk

I came across a reference to DeepWiki, a site that will generate “documentation” for any Github repository. I can't say I've been impressed with LLMs generating code, but what about documentation? I haven't tried that yet. Let's see how well Roko's basilisk dances!

Intially, I started with mod_blog. I've been working with the codebase now for 26 years so it should be easy for me to spot inaccuracies in the “documentation.” Even better—there's no interaction with a sycophantic chat bot; just plop in the URL for the repo, supply an email for notification when it's done and as the Brits say, “Bob's your uncle!”

Anyway, email came. I checked, and I was quickly amazed! Nearly 30 pages of documentation, and the overview was impressive. It picked up on tumblers, the storage layout, the typical flows in adding a new entry. It even got the fact that cmd_cgi_get_today() returns all the entries for a given day of the month throughout the years. But there was one bit that was just a tad bit off. It stated “[t]he system consists of three primary layers” but the following diagram showed five layers, which no indication of what three were the “primary layers.” I didn't have a problem with the layers it did identify:

Just that it seems to have a problem counting to three.

Before I get into a review of the rest of the contents, I'll mention briefly my opinions on the web site as interface: it's meh. The menu on the left is longer than it appears, given that scroll bars seem oh so last century (really! I would love to force “web designers” to use old-fasioned three-button mice and a monitor calibrated to simulate color-blindness, just to see them strugge with their own designs; not everyone has a mouse with a scroll-wheel, nor an Apple Trackpad). Also, the diagrams are very inconsistent, and often times, way too small to view properly, even when selected. Then you'll get the occasionally gigantic diagram. The layouts seem arbitrary—some horizontal, some vertical, and some L-shaped.

And it repeats itself excessively. I can maybe understand that across pages, saving a person excessive navigation, but I found it repeating itself even on a single page.

Other than those issues, it's mostly functional. Even with Javascript off, it's viewable, even if the diagrams are missing and the contrast is low.

One aspect I did like are the links at the end of each section refering to the source. That's a nice touch.

So with that out of the way—the “documentation” itself.

Mostly correct. I have a bunch of small quibbles:

  1. examples of running it on the command line don't need the –config open if $BLOG_CONFIG is set;
  2. $BLOG_CONFIG isn't checked in main.c but in blog.c;
  3. mod_blog outputs RSS 0.91, not RSS 2.0;
  4. “The system is written entirely in C and does not have Perl, Python or other scripting dependencies for the core engine itself.” Perhaps true? I mean, I do use Lua, but only for the configuration file;
  5. missed out how SUID is used (not for root to run, but as the owner of the blog);
  6. the posthook script returning failure doesn't mean the entry wasn't added, it just changes the HTTP status code returned.

I also found two problematic bits of code when reviewing this “documentation”—one is an actual bug in the code (the file locking diagram, while acurate to the code, made a caching issue stand out) and another one where I used a literal constant instead of a defined constant. At least I'm glad for finding those two issues, even if they haven't been an actual exploitable bug yet (as I think I'm the only one using mod_blog).

In the grand scheme of things, not terrible for something that might have taken 10 minutes to generate (I'm not sure—I did other things waiting for the email to arrive).

But one repo does not a trend make. So I decided upon doing this again with a09, my 6809 assembler. It's a similar size (mod_blog is 7,400 lines, a09 is 9,500—same ballpark) but it's a bit more complicated in logic and hasn't had 26 years of successive refinement done on it. As such, I found way more serious issues:

  1. Errors aren't classified. Errors are created as needed, sequentially. I make no attempt to bunch error codes into fixed ranges.
  2. It missed a key element of the dead code detection—it only triggers if the following instruction doesn't have a label.
  3. The listing file isn't kept in the presence of errors.
  4. It also got the removal of generated output files incorrect—they're only deleted if an error was detected on pass 1 or 2, not if a test failed.
  5. It repeats the precedence table on the same page.
  6. I do not have “Unsupported markdown: blockquote” or “Unsupported markdown: list” unary operators.
  7. Oh my God! I can't say how bad this backend matrix table is. It's all sorts of wrong. It's not that it got the supported/non-supported markers backwards, it appears to have just made up the results! And the same information on another page is bad as well. Not as bad as the first, but that's like saying bronchitus is not as bad as pneumonia. Both are bad. And it uses a different format for both tables. Consistency for the win! Sheesh.
  8. The example of writing an instruction to the various formats is wrong for the RS-DOS version—the type and length should be two bytes each, not one.
  9. The output format for -t is incorrect—it doesn't show a trace of the code being run unless the TRON directives are in use.
  10. Every example of the .ASSERT directive is just wrong as it did not use the proper register references, and memory dereferences need a @ (8-bit) or @@ (16-bit) prefix.
  11. Where you can use the .TRON direcive is wrong—it can be used anywhere; it's .OPT TEST TRON that can only be used inside a .TEST directive.

This, in my mind, is a much worse job than it did for mod_blog. I suspect it's due to the cyclomatic complexity being a bit higher in a09 than in mod_blog due to the cross-cutting nature of the code. And that probably causes the LLM to run up to, if not a bit over, it's context window, thus causing the confabulations.

I fear that is is meant to be used for legacy code with little or no documentation, and if it does this poorly on a moderately complex but small code base, I don't want to contemplate what it would do for a larger, older, and gnarlier codebase. I'd be up to try it, and I have a code base of 155,000 lines of C code written in the early 90s that's as gnarly as it gets, but I'm not that familiar with the codebase to feel confident that I can spot all the glaring errors, much less the more subtle issues.

Another issue are updates to the repo. The site sells itself as a wiki, so I suppose another aspect to this is you spend the time going through the generated “documentation” and fixing the errors, and then keep it up to date as the code changes. It's not obvious to me if one can rerun this over a changed repo, and if so, are the updates merged into the existing documentation? Replaced outright and you have to go through fixing the documentation again? I suspect this generated “documentation” will end up worse than bad comments in the code itself.

mod_blog has changed drastically over the years, and while the storage format itself hasn't, how it works internally has. There were at least three to four major revisions to the code base over the years. How major? One was a nearly a complete rewrite to remove a custom IO layer I had to using C's FILE *-style I/O about 18 years ago. Another one was removal of all global variables about three years ago. And for the past year, I've been removing features that I don't use. That's a lot of documentation to rewrite every few years.

Overall, this was less obnoxious than having the LLMs write code, but I feel it's still too inaccurate to be let loose on unfamiliar codebases, which I suspect is the selling point.


Discussions about this entry

Sunday, November 30, 2025

Semantic versioning is hard; let's go build a rocket

Wow! I found another bug from the depths of time in mod_blog, or rather, CGILib, which mod_blog uses. And again, this goes back right when I first wrote the code, possibly back in the late 1990s. And again, it's amazing that it took me this long (less than an hour ago) to trigger it!

When I made my previous post, the title came out incorrectly as “Error ID10T” when it should have been “Error ID10T: PEBKAC”. Somehow, any text after the colon was being swallowed up somewhere. It didn't take long to find the culpret in the function PairNew() and this bit of code:

  src  = *psrc;
  p    = todelim(src,&sname,delim,eos);
  name = malloc(sname + 1);

  if (name == NULL)
    return NULL;

  memcpy(name,src,sname);
  name[sname] = '\0';

  if (*p == delim)
  {
    src = p + 1;
    p   = todelim(src,&svalue,delim,eos); // WRONG!
  }
  else
    svalue = 0;

This function is used to parse a header line like Title: Error ID10T: PEBKAC and turn it into two values, TITLE and Error ID10T: PEBKAC. It's the line labeled “wrong” that points to the problem—I was checking for another occurance of delim (in this case, it's a colon) and ignoring anything past that when it shouldn't.

Sigh.

Like most bugs, the fix is easy, but what I do next is difficult. I (try to) use semantic versioning for CGILib, although there are two issues here—one, this is a bug so obviously the fixed version should be 8.0.8, but on the other hand, this introduces an incompatible change so the next version should be 9.0.0. On the gripping hand, it changes what I consider incorrect behavior and aligns the function to better reflect its name, so maybe version 8.0.8?

I think for CGILib this is largely academic, as I don't think anyone really uses the library except me. I shall have to think on this one.


Error ID10T: PEBKAC

Another errand Bunny and I did yesterday was to pick up a new black ink cartridge for the printer (which cost about half the price of the printer, but that's another rant for another time). We got home, and I replaced the cartridge. Later that night, Bunny attempted to print out The New York Times Crossword Puzzle, as she does every night.

Only it did not print properly. No black was being printed. We futzed with it for some time, and Bunny was able to get enough of the crossword puzzle to work with it, but overall, the black was problematic. I was fearful that we to buy a new printer.

After a night's rest, I had a thought—Bunny mentioned a problem with the printer yesterday, and when I checked, it stated it was out of black ink. I opened the printer up, and removed the black catridge, but the existing cartridges we had were for the previous printer. So that's when we went out running several errands and eating dinner. There were several hours when the printer was open, sans an ink cartridge, so maybe whatever ink was left in the print head had dried up, blocking the ink from flowing. Maybe some denatured alcohol could clean that up.

I removed the new cartridge, cleaned the area where the ink goes into the print head, and when I was putting the cartridge back in, I noticed a small tag on the cartridge sticking out.

Don't tell me, I thought. Did I forget to remove the “remove me before installing” tag?

Yup.

D'oh!

The printer works fine now.

Sigh.

Saturday, November 29, 2025

This “Internet” camera is only 21 years old—it should still work, right? Right?

One of the errands we ran today was stopping by the storage unit to retrieve the Christmas decorations. While there, I saw a box containing a D-Link DCS-900W WiFi-enabled Internet camera. I have no recollection of having ever bought it, nor using it. No clue, but I thought it might be fun to play around with it. How hard could it be?

For starters, you need Windows to configure the device. The CD that came with the device only had two types of files on it—PDFs and Microsoft executables. Undaunted, I figured the device might also some with a web-based interface. Or at least, I hoped it did. Not only did it come with Wi-Fi (and the antenna had definitely seen better days—it was only later when putting it back in the box did it start disintegrating, but I'm getting ahead of myself here), but it came with an Ethernet jack. Easy enough to plug into the “wireless service unit.” Then I just had to find it on the network.

No go. Reading the manual, you can reset the device to factory settings by using a paperclip to hit the rescessed reset button. That's easy enough. But then I read that the factory setting uses an IP address of 192.168.0.20.

That was going to be a problem.

At some point since this device was made (back in 2004 according to a date in the instruction manual) commerical network devices went from assuming a private network address of 192.168.0.0/24 back in the day to 192.168.1.0/24 in these more englightened times. But I'm not going to renumber my entire network for a camera. Okay, think …

I pulled out of storage a laptop with an Ethernet port, and a small, five port Ethernet switch. It would be easier to set up a small network of three devices. I power up the laptop, log in, and find that the space bar doesn't work. Every other key works, except for the space bar. Okay, think …

First, I try Deoxit, assuming maybe the spacebar switch has oxidized. That failed. Then I tried using some denatured alcohol to clean the actual contacts. That failed. It was not going to be easy to type commands into the xterm to reconfigure—wait a second! I have an xterm running. There's a space character on the terminal! I can copy the space character into the clipbard and use that!

It wasn't pretty, but it meant I could type

ip<paste from clipboard>addr<paste from clipboard>add<paste from clipboard>192.168.0.1<paste from clipboard>dev<paste from clipboard>eth0

and continue on with the increasing weirdness of the situation.

I was correct that one could configure the device from the web. I set it up to obtain an IP address via DHCP so I no longer needed the ad-hoc network and a spacebar-less laptop (this only took three attempts to get correct).

I could now reach it from the normal network, only I never saw a picture. I can't use the Active-X link because I'm not running Windows (and is Active-X even a thing in 2025? I honestly don't know). The Java applet didn't run because I'm not up to date on Java (I don't use the language for anything) and the links to update it are long gone for my OS. And the static image from just the web doesn't work because the link the camera generates doesn't exist on the camera!

Well then … so much for an Internet camera. Back into the box it goes.


Extreme Enlightened tree, Minecraft edition

Bunny and I were running some errands when we decided to stop and get a bite to eat at a local restaurant. Outside I saw the most amazing enlightened tree I've seen in several years—it was like something straight out of Minecraft:

[A tree with white Christmas lights covering the trunk and green Christmas lights in the canopy, looking very much like a Minecraft tree] But unlike Minecraft, this tree canopy is a cylinder and not blocks.

Unfortunately, Jack Black wasn't anywhere nearby with Lava Chicken. Ah well …

Tuesday, November 18, 2025

“Thanks for calling that out. That was bad advice on my part.”

For reasons, I noticed that ClaudeBot (and I'd link to the information page for it, but, and this is my shocked face, they forgot to include a proper link!) was repeatedly requesting the same file over and over again—/sitemap.xml. My bad! I neglected to configure my web server to return “410 Gone” for that page. I went to the main site of the bot and found a “call to sign up!” but sadly, no obvious link to stopping their bot.

So I figured, I might as well try this LLM thing once more. The last last few times I tried, it didn't go well. Maybe third time's a charm?

Maybe?

Sean

How do I stop your XXXXX­XX webbot from crawling my website?

Claude

Thinks hard for a few minutes.

I can help you block Anthropic's web crawler from accessing your website.

You'll want to add rules to your robots.txt file. Here's what to add:

User-agent: anthropic-ai
Disallow: /

User-agent: Claude-Web
Disallow: /

Put this in your robots.txt file at the root of your domain (e.g., https://yourdomain.com/robots.txt). [Apparently Claude doesn't know that there's a domain set aside specifically for examples, namely, EXAMPLE.COM. Claude also doesn't realize that yourdomain.com actually exists! Way to go, Claude! slow clap]

The two user-agent strings cover Anthropic's crawlers:

  • anthropic-ai - the main crawler identifier
  • Claude-Web - an alternative identifier that may be used

If you want to block specific sections instead of your entire site, you can replace / with specific paths:

User-agent: anthropic-ai
Disallow: /private/
Disallow: /admin/

The crawler should respect these rules within a reasonable timeframe after you update your robots.txt file.

If you're still seeing crawling activity after adding these rules, or if you need immediate blocking, you could also block by IP address in your server configuration or firewall, though the specific IPs may change over time.

Sean

Finds the page that describes Anthropic's web bots, not from a link on the Anthropic web site, but a normal web search.

What about Claude-User? ClaudeBot? Claude-SearchBot? Did you XXXXX­XX forget those? Or did your overlords tell you to conveniently “forget” those as well?

Claude

You're absolutely right - I should have given you the complete list. Here are all the Anthropic crawler user-agents you should block:

User-agent: anthropic-ai
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-SearchBot
Disallow: /

User-agent: Claude-User
Disallow: /

I didn't intentionally leave those out - I gave you an incomplete answer, which wasn't helpful when you're trying to actually block the crawlers. Thanks for calling that out.

If you want to be thorough, you could also add a blanket rule that catches any variations:

User-agent: *Claude*
Disallow: /

User-agent: *anthropic*
Disallow: /

Though the specific user-agent blocking is more precise if you want to maintain control over what else can access your site.

Sean

Are you XXXXX­XX sure that's legal syntax? That doesn't look kosher to me. Again, are you trying to get around any bans?

Claude

You're right to call that out - wildcards with asterisks like *Claude* are NOT valid robots.txt syntax. That was bad advice on my part.

The correct approach is to list each user-agent explicitly:

User-agent: anthropic-ai
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-SearchBot
Disallow: /

User-agent: Claude-User
Disallow: /

That's the proper syntax. Each User-agent directive needs to be spelled out exactly, and wildcards don't work in the User-agent field in standard robots.txt files.

I'm not trying to get around anything - I just gave you incorrect technical information, which is worse than useless when you're trying to actually implement this. My apologies for the confusion.

Oh wait! I know! I'm holding it wrong, and I'm using an LLM that is older than 20 minutes. My mistake. And before you say I should have asked politely, no. This is a computer! It's not human! Stop anthropomorphizing it!

And of course, at the bottom of the chat page, grey-on-slightly-darker-grey, is the text “Claude can make mistakes. Please double-check responses.”

How anybody finds this garbage useful is beyond me.


“Our kids can't play and we will not be able to walk due to traffic”

Last week Bunny found a flyer on our door:

NOTICE OF PUBLIC HEARING
THE COMMUNITY APPEARANCE BOARD

City of Boca Raton, Florida hereby gives notice:
6:00 P.M. on November 18, 2025
Boca Raton Community Center–Annex Building
260 Crawford Boulevard

A phased mater plan amendment and site plan approvals to authorize the following on a 77-acre site, generally located at XXXXX­XXXXX­XXXXX­XXXXX­XXXXX­XXXXX, with approximately:

That 77-acre site (which is nothing but dense tropical trees) is right behind Chez Boca (and 32 other homes). This is not new—it's been something that the City of Boca has been threatening us with for the better part of fifteen years. And then yesterday, again on our door, we found another flyer, with largely the same information but with “Please attend it is urgent” and “Our kids can't play and we will not be able to walk due to traffic” scrawled in the margins.

Well, how can I pass that up?

At 6:00 P.M I found myself sitting in the Boca Raton Community Center Annex Building with about 50 other people. As the meeting began, it became apparent that this will happen. Maybe not next year, but eventually. Also, this meeting was to approve the architecture and landscaping only! No mention of traffic or noise would be brokered, nor would questions during the presentation of the architectural and landscaping firm. Afterwards, yes. But not during.

During the presentation, it was mentioned that the development was based loosely upon Mizner Park, an upscale mix-zoning development near downtown Boca Raton. They were going for a similar mixed-zoning but not with the same architectual style as Mizner Park. The plans aren't as dense as they're zoned for, so it's not quite the 150,000 square feet of reatil mentioned, which is nice. And they aren't developing the entire 77-acres—there's a large section in the north-east that will be left largely alone for now.

But one of my bigger concerns going into the meeting was ingress and egress. There are four potential spots for this—the one to the north would be a nightmare as it would be between the I-95 exit and residential homes. The one to the south is better—it's near the I-95 exit but not right on top if it. The final two spots lead directly into our neighborhood, at both ends of the street I live on!

Turns out there are only two points of ingress and egress. The main one on the south, and the other one off the street I live on (at the south end) to only be used by emergency vehicles like fire and ambulances; it will be fenced off from any traffic (car or foot) otherwise.

We were also assured that there will be a 100 foot (30 m) buffer between homes along the easter edge and any development, with none of the trees removed in said buffer space (with the exception of invasive species, which will be replaced with native trees).

I'm not trilled with the prospect, but if the current plans are followed, it's not horrible; it could be a lot worse. I also thought of asking when this development was supposed to actually start only after the meeting, but given how long this has been threatened, I'm not expecting any ground breaking any time soon.

I hope.

Tuesday, November 11, 2025

Extending the syntax when calling assembly language subroutines for Color BASIC

A few days ago on the Color Computer mailing list, Allen Huffman asked:

The BASIC ROM has the USR function:

DEF USR0=&H3F00

A=USR0(42)

It accepts one parameter.

Since it jumps from BASIC into the USR assembly, couldn’t that code just parse a “,” and more numbers, allowing it to accept whatever needed to be passed in?

A=USR0(1,2,3,4)

[Coco] USRx() and adding more parameters?

Since we're talking about a 45 year old computer and with zero chance of a newer version of BASIC coming out any time sooner, the answer is “yes,” if you don't mind digging through the Unravelled Series (a series of books giving a disassembly of the Color Computer BASIC ROMs) and calling a bunch of nearly undocumented routines.

We'll start with the following code that implements a 16-bit version of PEEK, using some include files I wrote:

	include	"basic.i"
	.opt	basic defusr0 peekw
	org	$7F00

peekw	jsr	INTCVT	; convert parameter to integer
	tfr	x,d	; transfer into an index register
	ldd	,x	; read 16 bits from memory
	jmp	GIVABF	; return value to BASIC
	end

Now, a 16-bit version of PEEK is nice, but it would also be nice to have a 16-bit version of POKE. But USRn only takes one parameter; if we want an aditional value, we're going to have to extend the BASIC parser. Fortunately, due to the way Color BASIC works, this is easy and there are quite a few routines we can call. The first call is to a routine I'm calling CB.evalcomma (as in the Unravelled Series it goes by the name of LB26D). We need the comma, otherwise BASIC will return a syntax error without it. The next call we need is to a routine I'm calling CB.eval, which just evaluates an expression. And that's pretty much all that is needed to parse the additional syntax needed for this. The code looks like:

	include	"basic.i"
	include	"basic-internal.i"
	.opt	basic defusr1 pokew
	org	$7F00

pokew	jsr	INTCVT		; convert parameter to integer
	pshs	d		; save the address
	jsr	CB.evalcomma	; parse past comma
	jsr	CB.eval		; parse expression
	jsr	INTCVT		; convert expression to integer
	puls	x		; get address
	std	,x		; write data into address
	rts			; return to BASIC

And to use this:

DUMMY=USR1(&H76),1234

As it was pointed out, the code that handles USRn parses an expression in parentheses, so a modified USRn that parses would have to look something like A=USR0(1),2,3,4.

It's unfortunate that we can't just call USR1() without using the result, but that's a limitation of ColorBASIC which requires the results of USRn to be used. Other than that, we have successfully extended BASIC to modify how USRn works. This can also work with EXEC, the other way assembly code is called in BASIC (I'll leave that as an exercise for the reader).

But there are issues:

DUMMY=USR0(&H76),&HA000
?FC ERROR
OK

The issue—INTCVT checks if the value being converted is between -32,768 and 32,767 (a signed 16-bit value). The value of &HA000 is 40,960 (which still fits in 16-bits, but is unsigned) and thus, we get the ?FC ERROR from BASIC. This issue also affects the PEEKW function as we can't easily peek ROM addresses. If we wanted to peek the 16-bits at address 40,960, we would need to pass in the value of -24,576. That will work, but the value returned would be -24,117, which is $A1CB, the address that is stored at address 40,960 (or $A000 in hexadecimal).

It would be nice if we could correct these issues.

In looking through the Unravelled Series, I did find two routines that help. The first is CB.uintcvt. Like INTCVT this returns a 16-bit value in the D register, but doesn't signal an error if the value is outside -32,768 and 32,767. The other routine is CB.addrcvt, which returns the 16-bit value in the X register. We can thus rewrite our POKEW function as:

	include	"basic.i"
	include	"basic-internal.i"
	.opt	basic defusr1 pokew
	org	$7F00

pokew	jsr	CB.addrcvt	; convert parameter to an address
	pshs	x		; save the address
	jsr	CB.evalcomma	; parse past comma
	jsr	CB.eval		; parse expression
	jsr	CB.uintcvt	; convert expression to integer
	puls	x		; get address
	std	,x		; write data into address
	rts			; return to BASIC

And it all works as expected.

But that still leaves the PEEKW function returning negative values. Again, it would be nice if we could return an unsigned result to BASIC. I scanned the Unravelled Series but I could not find a routine to call. Perhaps I didn't look hard enough, but I did come up with a workaround. It's not pretty, but it works.

	include	"basic.i"
	.opt	basic defusr0 peekw
	org	$7F00

peekw	jsr	INTCVT	; convert parameter to integer
	tfr	x,d	; transfer into an index register
	ldd	,x	; read 16 bits from memory
	bmi	.neg
	jmp	GIVABF	; return value to BASIC
.neg	std	FP0	; return negative value as unsigned
	lda	#$90	; to BASIC
	sta	FP0EXP
	clr	FP0+2
	clr	FP0+3
	clr	FP0+4
	rts
	end

For positive values, I still use GIVABF. For negative values, I construct the floating point value “by hand” since all integer values from 32,768 to 65535 use the same floating point exponent. All I can say is “it works.” And given how seldom I've wanted to return an unsigned value, I can't say that it's a bad solution for this one case.

Then I got to thinking—could I combine the two routines into one? One function that can either peek or poke a 16-bit value? Where X=USR0(&H76) would read a 16-bit value, and X=USR0(&H76),&HA000 would write a 16-bit value while returning the original 16-bit value? I mean, why not make the return value for poking memory do something? But this means peeking ahead for a comma and deciding what to do.

Fortunately, there are enough examples of this in ColorBASIC that's it's relatively straight forward—the next character in the input can be found by the pointer stored at address CB.charad, and wouldn't you know it, the MC6809 can read through a pointer at an address to get it:

	ldb	[CB.charad] ; CB.charad contains an address
	cmpb	#','	; is it a comma?

So now we can combine the two routines:

		include	"dp.i"
		include	"basic.i"
		include	"basic-internal.i"

		.opt	basic defusr0 peekpokew
		org	$7FD0

peekpokew	jsr	CB.addrcvt	; return parameter in X
		ldd	,x		; read 16-bit at address
		pshs	x,d		; save address and data
		ldb	#','		; check for comma
		cmpb	[CB.charad]
		bne	.return		; if none, just a peek
		jsr	CB.evalcomma	; else parse the comma
		jsr	CB.eval		; evaluate the expression
		jsr	CB.uintcvt	; convert to 16-bit unsigned
		std	[2,s]		; store expr in address
.return		puls	x,d		; restore data (and stack)
		tsta			; test data < 0
		bmi	.neg		; if so, handle
		jmp	GIVABF		; return pos via BASIC
.neg		std	FP0		; return negative as unsigned
		lda	#$90
		sta	FP0EXP
		clr	FP0+2
		clr	FP0+3
		clr	FP0+4
		rts

		.pcle	$8000
		end

The PCLE directive stands for “PC register Less Than or Equal to” and is there to ensure our code doesn't run into ROM. If it does, it generates an error:

pw.asm:30: error: E0106: PC 8030 exceeds given limit 8000

and one can adjust the origin appropriately.

There are also some other functions I found, CB.evalopar which parses an open parenthesis, and CB.evalcpar which parses a close parenthesis. Then there's ECB.evalpoint which parses two expressions in parentheses: (x,y) and places the results in the variables ECB.horbeg (horizontal beginning) and the second into ECB.verbeg (verical beginning)—nice for parsing X,Y coordinates for graphics, and ECB.evalrect which parses a pair of points: (x1,y1)-(x2,y2) and places the second set of values into ECB.horend and ECB.verend. This can lead to some weird looking BASIC code:

X=USR3(4),(X1,Y1)-(X2,Y2)

but it works.

Monday, November 03, 2025

Limitations of a two-pass assembler

I've come to realize that supporting foward references in a two-pass assembler isn't always easy. The simple case of forward references I support:

	lda	#alpha
alpha	equ	5

On pass 1, alpha isn't defined, but by pass two, we have its value—5.

With this code, however:

	lda	#alpha
alpha	equ	bravo
bravo	equ	5

alpha is undefined on line 1, and it remains undefined even on line 2 because we haven't defined bravo yet. Thus when we end pass 1, alpha is still undefined. That it took me two years to even stumble across this issue is a bit surprising to me. I just haven't written 6809 assembly code like this.

Can I fix this? If I add another pass, probably. If I don't want to add another pass … I don't know. I would have to track expressions that aren't fully defined in pass 1, which could be a lot of work for an issue that might not come up all that often (if my own code is to go by). I mean, things can get quite pathological:

		lda	#Alpha
Alpha 		equ	Bravo+1
Bravo           equ	Charlie+1
Charlie		equ	Delta+1
Delta		equ	Echo+1
Echo		equ	Foxtrot+1
Foxtrot		equ	Golf+1
Golf		equ	Hotel+1
Hotel		equ	India+1
India		equ	Juliet+1
Juliet		equ	Kilo+1
Kilo		equ	Lima+1
Lima		equ	Mike+1
Mike		equ	November+1
November	equ	Oscar+1
Oscar		equ	Papa+1
Papa		equ	Quebec+1
Quebec		equ	Romeo+1
Romeo		equ	Sierra+1
Sierra		equ	Tango+1
Tango		equ	Uniform+1
Uniform		equ	Victor+1
Victor		equ	Whiskey+1
Whiskey		equ	Xray+1
Xray		equ	Yankee+1
Yankee		equ	Zulu+1
Zulu 		equ	1

lsawm (part of LWTools) does properly handle this pathological case but it does six passes, not two. The other 6809 assembler I have, an older one written back in the 90s, doesn't and issues deceptive error messages, so it's not like I'm the only one to not handle this properly.

As of now, I just issue an error and let the programmer deal with it.

Obligatory Picture

[Self-portrait with a Christmas Tree] Oh Chrismtas Tree!  My Christmas Tree!  Rise up and hear the bells!

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer

No AI was used in the making of this site, unless otherwise noted.

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2025 by Sean Conner. All Rights Reserved.