The Boston Diaries

Thursday, June 05, 2025

Avoiding Roko's basilisk, part II

The other day I came across this comment on Lobsters:

On a personal level I have helped various people get value out of AI tools where they initially did not understand how to use it properly. But that setting is more of a 1:1 for a specific situation. For generic how to use agentic tools, there are so many articles already. Peter Steinberger has a multi hour talk online of him using an army of agents to write on his project.

If someone has a specific situation where they failed using an agent, ideally with some open source code, I would be happy to have a look at it. It’s just hard to engage on abstract “does not work for me” posts.

Comment on “AI Changes Everything”

I failed using an agent a few months ago. It was on an open source project of mine. Perhaps mitsuhiko would be happy to have a look at it. So I replied.

And mitsuhiko was happy to look at it.

Or rather, spend a few minutes telling his “coding agent” to look at the code and let it do its thing. So I took a look.

Development was done on a Mac, which doesn't have the vm86() system call, so his agent, “Claude,” started writing an 8086 emulator. Or I should say, an 80386 emulator since that's the most common architecture these days. It also came up with a few tests and once it those tests were working, it stopped.

When I tried the code, attempting to run RACTER.EXE, it just sat there, turning my computer into a space heater. Looking a bit further, I saw there was an option for debug output (but the option appears at the end of the command line, not after the command itself, like every other command on Unix). Then I saw line after line of

...
Execute: 2010:0020: 8B
Unhandled opcode at 2010:0020: 8B
Execute: 2010:0021: EC
Unhandled opcode at 2010:0021: EC
Execute: 2010:0022: 81
Unhandled opcode at 2010:0022: 81
Execute: 2010:0023: EC
Unhandled opcode at 2010:0023: EC
Execute: 2010:0024: 02
Unhandled opcode at 2010:0024: 02
Execute: 2010:0025: 00
Unhandled opcode at 2010:0025: 00
Execute: 2010:0026: 9A
Unhandled opcode at 2010:0026: 9A
Execute: 2010:0027: C2
Unhandled opcode at 2010:0027: C2
Execute: 2010:0028: 10
Unhandled opcode at 2010:0028: 10
Execute: 2010:0029: 52
Unhandled opcode at 2010:0029: 52
Execute: 2010:002A: 24
Unhandled opcode at 2010:002A: 24
Execute: 2010:002B: 9A
Unhandled opcode at 2010:002B: 9A
Execute: 2010:002C: A2
Unhandled opcode at 2010:002C: A2
Execute: 2010:002D: 19
Unhandled opcode at 2010:002D: 19
Execute: 2010:002E: 52
Unhandled opcode at 2010:002E: 52
...

To say I was underwhelmed is an understatement.

The thread somewhat petered out.

I noticed today that mitsuhiko gave it another attempt. He put the whole thing into Docker so he could run under a Linux VM, and the code now could run enough of RACTER.EXE to display the banner:

[spc]lucy:/tmp/racter>/tmp/NaNoGenMo-2015/C/msdos RACTER.EXE



          .-----------------------------------------------------,
          |                                                     |
          |            A CONVERSATION WITH RACTER               |
          |                                                     |
          |       COPYRIGHTED BY INRAC CORPORATION, 1984        |
          | PORTIONS COPYRIGHTED BY MICROSOFT CORPORATION, 1982 |
          |                   ...........                       |
          `-----------------------------------------------------'




Hello, I'm Racter.  You are?  
>Sean
Sean

But that's it. It's still chugging along, turning my computer into a space heater. I'm still unimpressed.

This isn't to fault mitsuhiko. I'm sure he finds value in AI agents coding for him, but I think this was way out of his bailiwick, which is why he didn't bother to understand what I was trying to attempt. “Claude” got to the point of printing the banner from RACTER.EXE and stopped, because I think that's all it was instructed to do, besides attempting to buffer the input.

I'll close this out with the last few comments in the thread:

Sean: What type of programming do you do? Or rather, what type of programming do you have Claude do for you? Because I am still unconvinced it will be any benefit to the programming I do.
mitsuhiko: Right now I’m building a backend for a prototype of the next project I’m working on. That is a rather complex web application using both Python and Rust. Over the last year or so I used it quite a bit to extend minijinja (but that wasn’t agentic yet).
Sean: Ah, stuff that is definitely over-represented in the training sets. Gotcha.
mitsuhiko: Considering that I’m doing a very fringe thing I’m not so sure that this is a very accurate assessment :)
Sean: Python, Rust and web applications are over-represented in the training sets. The 6809, RACTER.EXE and ANS Forth aren’t. What you are doing might be novel, but the tech being used isn’t. The stuff I described isn’t novel (well, maybe having RACTER and Eliza chat, but I was riffing on an article written in the 80s about doing that) but using tech that (in my opinion) is novel (that is, not mainstream). There’s a difference.

I do appreciate the attempt though.

Update on Friday, June 6^th, 2025 at 3:06 AM

One last comment from mitsuhiko in the thread: “I had excellent results with completely niche technology too. For as long as you have a way for the machine to validate it’s [sic] outputs it can even program in languages that you just invented.”

I think I'll have to keep this in mind for next time.

Thursday, June 05, 2025

Avoiding Roko's basilisk, part II

Update on Friday, June 6^th, 2025 at 3:06 AM

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer

The Boston Diaries

Thursday, June 05, 2025

Avoiding Roko's basilisk, part II

Update on Friday, June 6th, 2025 at 3:06 AM

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer

Update on Friday, June 6^th, 2025 at 3:06 AM