Ten years later and the prospect of another 24-hour drive to Motown just doesn't sound appealling anymore, my hatred of flying nonwithstanding. So when I was invited to another counsin's wedding a few weeks ago, I decided perhaps it was best to fly this time. And since it has been sixteen years since I last flew, my flying budget was pretty large. It was an easy decision to opt for comformt over price—direct flight, first class, and whatever hoops to speed through Securithy Threater.
And that's how I found myself at 10:30 am sitting at gate D-2 at the Ft. Lauderdale-Hollywood International Airport, watching the line for Starbucks snake clear across the terminal, everybody head down staring at their smart phones, or looking straight ahead talking into the air. Bunny and I were waiting to board the plane for an hour and a half, and that line for Starbucks never got any shorter.
The flight itself was uneventful. I used to fly the Ft. Lauderdale-Detroit flight every year as a kid to spend the summer with my paternal grandparents. Flying back then was way different than today. First off—this was just prior to the airline deregulations, so the various arlines had to compete on service, not price. Everybody was allowed to go to the gate, so family and friends could wait with you there, and family and friends could meet you at the gate on the other side. I do think there was some security—at least a metal detector, but nothing like the security theater of today. But it wasn't cheap—the cost of flying my 10 year old self, adjusted for inflation, is what I paid for a first class ticket to fly myself this year.
I don't think I want to know the price of a first class ticket back in the day.
The service on the flight? It was okay. I mean, a snack box of cheese, crackers and some gummi bears pales to actual meals I remember eating, but the entertainment center in the seat? A wide selection of movies, music, and live flight tracking? That would have blown my 10-year old mine back in the day.
On the down side, the pre-flight safety dance was not performed by the crew, but instead shown on the said entertainment center, after two mandatory airline commercials. Seriously.
I also found it amusing that airplanes still have no-smoking indicators. How long has it been since one could smoke on a plane? Thity years? Fourty years? I wonder at what point they'll finally be removed.
Another change I noticed—now that each seat has a built-in entertainment center, there's no longer an in-flight magazine or Sharper Image catalog.
The idea of the scrum framework is to organize a development process to move through the different project cycles faster. But does it always incentivize the right behaviours doing so? Many of the users who joined the debate around the question on Stack Overflow have similar stories of how developers take shortcuts, get distracted by their ticket high score, or even feign productivity for managers. How can one avoid these pitfalls?
That the question has been migrated from our workplace exchange to the software engineering one shows that developers consider concerns about scrum and its effectiveness larger than the standard software development lifecycle; they feel its effect on their workplace as a whole. User Qiulang makes a bold claim in their question: Scrum is turning good developers into average ones.
Could that be true?
“99 failing tests in the queue! 99 failing tests! Check one out, grind it out, 98 failing tests in the queue!”
So I'm facing this nearly twenty-hour long regression test and I had this idea—instead of querying the mocked end point if it did its thing or not, have the mocked endpoint check to see if it did its thing or not.
I now send the mocked endpoint the testcase itself
(along with a query flag of
The mocked endpoint will save this information,
and set a
queried flag for this testcase to false.
If it is queried,
it updates the queried flag for the given request.
At the end of the regression test
(and I pause for a few extra seconds to let any pending requests to hopefully finish),
the mocked endpoint will then go through the list of all the testcases it was told about,
and check to see if the query flag and queried flag match—if not, it logs an error.
Sure, now we have two error logs to check, but I think that's better than waiting nearly twenty hours for results.
I got a baseline time for the subset of the regression test without the mock checks—35 seconds. I'm trying to beat a time of 4 hours, 30 minutes with the mock checks.
The new method ran the subset in 40 seconds. The entirety of the regression test, 15,852 tests, took just a few minutes—about the same as before.
I can live with that.
Now all that's left is to write the validation logic—I still don't have it down yet.
It's 2:17 am as I'm typing this, sitting on a phone bridge during a deployment of the new “Project: Lumbergh” and I'm glad to know that I'm not the only one with a clicky keyboard. My comment about it brought forth a brief conversation about mechanical key switches, but it was short lived as the production kept rolling out. It's sounding a bit like mission control at NASA. So while I'm waiting until I need to answer a question (happened a few times so far), I thought I might go into some detail about my recent rants about testing.
It's not that I'm against testing, or even writing test cases. I think I'm still coming to grips with the (to me) recent hard push for testing über alles in our department. The code was never set up for unit testing, and for some of the harder tests, like database B returning results before database A, we did manual testing, because it's hard to set such a test up automatically. I mean, who programs an option to delay responses in a database?
It's especially hard because “Project: Lumbergh” maintains a heartbeat (a known query) to ensure the database is still online. Introducing a delay via the network will trip the heartbeat monitor, taking that particular database out of query rotation and thus, defeating the purpose of the test! I did end up writing my own database endpoint (the databases in question talk DNS) and added an option to delay the non-heartbeat queries. But to support automatic testing, I now have to add some way to dynamically tell the mocked database endpoint to delay this query, but not that query. And in keeping with the theme, that's yet more testing, for something that customers will never see!
Then there's the whole “checking to ensure something that shouldn't happen, didn't happen” thing. To me, it feels like proving a negative. How long do we wait until we're sure it didn't happen? Is such activity worth the engineering effort? I suspect the answer from management is “yes” given the push to Test All The Things™, but at times it feels as if the tests themselves are more important than the product.
I'm also skeptical about TDD in general. There's this series of using TDD in writing a sudoku solver:
- OK, Sudoku
- Moving On With Sudoku
- Sudoku: Learning, Thinking and Doing Something About It
- Sudoku 4: Disaster Narrowly Averted
- Sudoku 5: Objects Begin to Emerge
Reading through it, it does appear to be a rather weak attempt at satire of TDD that just ends after five entries. But NO!—this is from Ron Jeffries, one of the founders of Extreme Programming and an original signer of the Manifesto for Agile Software Development. If he gave up on TDD for this example, why is TDD still a thing? In fact, in looking over the Manifesto for Agile Software Development, the first tenent is: Individuals and interactions over processes and tools. But this “testing über alles” appears to be nothing but processes and tools. Am I missing something?
And the deployment goes on …
So last night I ran a subset of the regression test in 4½ hours and got a few errors where something that shouldn't happen, happened (and it's this “checking for not an event happening” that takes the time). Well, it wasn't a bug in the code being tested, but a bug in the regression test (Surprise! Surprise! Surprise! Only not really). I think that says more about our business logic than it does about CZ or me; both of us attempted to validate this part of the business logic in the regression test, and we both got it wrong.
And about parallelizing the regression test—yes, it's possible. But doing so on the spot isn't. The easy solution is to run the regression test on multiple machines—nice if you have them. The other option is to parallelize the run on a single machine and the code just isn't set up to do that. I'm not saying it's impossible, but it will take engineering effort, and more importantly, testing! Funny how testing your test cases isn't talked about that much.
The slowdown of the regression test is due to “proving a negative”—that is, checking for something that's not supposed to happen did not happen. And in a distributed system like ours, that's not easy to test—a check could happen before the event due to any number of reasons, and how long do you wait to ensure that what shouldn't happen didn't happen?
The other issue to why it will take so long to run is just the sheer number of tests that are run. My “retiring any day now” manager has never been happy with the “shotgun” approach I took to generating the tests—I basically generate thousands of combinations of conditions, most of which “should” never appear in production. But one of those “should never happen” things did happen about seven years ago and well, the less said about that the better. So at least my “shotgun” approach does have the effect of testing for a lot of “I don't know” conditions (most of which are misconfigurations of data from provisioning). And each test we add could potentially double the number of tests cases. I'm sure there's a way to reduce the number of test cases, but to the TDD acolytes out there (and the new management team does appear to follow TDD tenents), “one does not simply reduce the number of test cases.”
And the regression test rolls on …
Yesterday, I said the full regression test might take over 13 hours. In light of the results of running just a partial test, it turns out the full regression test will take over 19 hours! The jokes on me though—when I said it would "be fun reporting at the next meeting" I wasn't expecting my new manager to double down on the regression test. Seriously, he asked “Can you run it in parallel?”
So I wonder—are there unit tests for the various unit test frameworks out there? How are they tested?
Still writing tests. I added checks to see if something that's not supposed to happen didn't happen and I'm only running it over the tests that may actually do the thing that's not supposed to happen (3,456 tests). Doing so adds three seconds of overhead to each test … multiply … carry the one … and hey! Only three hours to run this test run! And oh look! It found something that wasn't supposed to happen did happen!
It's going to be fun reporting at the next meeting that running the full regression test will now take over 13 hours. I'm thinking I'll be tasked with coming up with a different approach to the regression test.
Remember kids! Tests are more important than the program! Testing! Über alles!
Update on Wednesday, June 9th, 2021
Hmmm … it took 259 minutes, 57.5 seconds (4 hours, 20 minutes) to run through the 3,456 tests, so each test took around 4½ seconds, not 3 seconds. Things aren't looking so good for this regression test …
There is definitely a culture clash at work with regard to testing. I was involved with a four hour discussion about our current regression test with the newer members to the department. The new manager, AG, was saying the regression test should test all the paths in the code. I was saying, yes, that is possible, but many of the new tests would slow the regression test down to a crawl. Then CZ would chime in saying it should be possible to test without slowing down to a crawl. Both AG and I countered that some of the tests involve a timeout, because how would one otherwise test a negative? CZ said we could check the logs, since the logs are logged in a particular order, but I countered that our previous regression test (which runs with the Protocol Stack From Hell) takes over five hours because it checks the logs to ensure we don't miss a log. Using the logs to check to see if something happened isn't deterministic, because the check might come before the action is even logged, and you are still trying to test a negative (in this case, something that should not happen, did not happen, across a distributed, multi-process system) and that slows down the regression test.
We had to leave it with an agreement to disagree on the details for the time being.
Also discussed was a reversion of code in “Project: Lumbergh.” I felt the reversion reverted too much and involved parts that weren't part of “Project: Lumbergh.” CZ felt that was fine, because the changes to the other parts were a part of “Project: Lumbergh” if only indirectly. I countered that in that case, he should have reverted the changes in a different repository that the regression test lives in because that too, indirectly relates to “Project: Lumbergh.” CZ said no, it's in a different repository and thus, shouldn't be reverted. I retorted that the changes he reverted weren't part of “Project: Lumbergh” directly, just as the regression test isn't part of “Project: Lumbergh” directly, and it's only a historical artifact of how our stuff was developed that “Project: Lumbergh,” “Project: Wolowizard,” and “ Project: Clean-Socks” (plus a few others I haven't mentioned) all share a single repository, and that our other repository contains “Project: Sippy-Cup,” “Project: Cleese,” “Project: Seymore” and our current regression test.
We had to leave it with an agreement to disagree on the details for the time being.
Also related to the repositories,
we also had a discussion about the versioning software we use.
I mentioned I prefer
git but we're stuck with SVN for the moment,
with the only thing I like about SVN is the ability to checkout a subdirectory in a repository—it's certainly not because branches are easy,
or SVN tags make any sense to me.
CZ likes SVN for it's ability to easily branch and tags and hates
git's ability to stash files
(we still haven't moved from SVN to
We had to leave it with an agreement to disagree on the details for the time being.
I have a feeling this is going to be a rough few months.
Wlofie, via MeLinkedInstaMyFaceBookWeInGramSpace, sent me to A Mind is Born, a 2½ minute demo program (a program that display some mind-blowing graphics and music) done on the Commodore 64. The impressive thing here isn't that it's done on the C-64, nor the graphics, nor the music, but that he did all that in less than 256 bytes!
For reference, the above paragraph is 351 bytes in size (or 521 if you include the HTML markup). That is inconceivable to me that one can even do something like that with so little.
I've been meaning to post this for quite a while, but I've been lax in doing so. Anyway, for my D&D buddies a few videos: one that asks the age old question, “can you go to the toilet in medieval armor?” The second one covers misconceptions about inns, accommodations and taverns in medieval times. These are interesting, but I'm not sure if I'm going to do that level of detail when running my “every-two-week” D&D game.
This last video is mind blowing (at least to me)—fast food in ancient Rome. It's odd to think that “fast food” is as old as Rome—MCROMVLVS anyone? But yes, Roman Empire fast food.
As much as I wanted the Optimus Maximus, the price of $1,600.00 kept me from buying one, which according to this review was a Good Thing™ (and that Youtube channel is nothing but keyboard reviews—amazing!).
The point of writing software seems to be writing tests for the software, not in running the software
I've been stuck in testing hell the past few weeks,
what with the
and the new
guards crew are heavy into the whole “unit test über alles” religion.
I understand the reason behind testing,
and unit testing in particular.
The few times I've tried unit testing,
it has been helpful,
- I think it works best when you start out with unit testing in mind;
- it works best with a module or library;
- you test at the boundary of the module or library and not the implementation;
- you do have tests for your tests, right?
That last bit might sound a bit glib, but it is a real concern. Even Knuth is quoted as saying, “beware of bugs in the above code; I have only proved it correct, not tried it.” I even joked about having to test our tests with both my current (but soon-to-retire) manager and his (soon-to-me-my) manager, because our current regression test has 15,852 individual tests (not all of them might be valid—hard to say, since the tests are automatically generated).
“Project: Lumbergh” was not written with unit tests in mind. Or more specifically, it is a unit. The whole thing. And it's complicated because not only does it implement our business logic (which over ten years has gotten quite complex) but because it has to query multiple databases at the same time across the network (“at the same time” because it has to do its job as a phone call is being made). How does one deterministically test delayed (and/or dropped) queries?
Another issue—“Project: Lumbergh” sends a message to “Project: Cleese” in some circumstances (“Project: Cleese” handles an HTTP notification on behalf of “Project: Lumbergh” because trying to add a TCP connection in a program that deals with UDP would have taken too much engineering time) and we need to check if “Project: Cleese” was notified. I solved that one by mocking the HTTP endpoint that “Project: Cleese” talks to with two interfaces, one for “Project: Cleese” and one for the regression test to query. The issue there is the regression test might ask the mock if it received a request from “Project: Cleese” before “Project: Cleese” gets the notification from “Project: Lumbergh” (a classic race condition). I got it working reliably, but now the regression tests takes over twenty minutes to run, instead of the two-plus minutes (even on the happy path, and I'm not entirely sure why).
It seems like management is more concerned about the tests rather than the product.