The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Thursday, August 19, 2021

The case of the regression test regression

So the SVN-git conversion is back on the table. When I worked on this last year, I stopped when it became clear that a critical build server didn't support git. I duly reported the issue, and it's only been fixed in the past few months. Last year, it was delegated solely to me. This year, it's more of a team effort.

For our team, we currently have two main SVN repositories with some duplicate third party packages checked in. We went through the lists to remove the redundencies (a Good Thing™) and I felt that it might be a good time to ensure they're up to date. I spent a week doing so and ran into an odd issue—the old regression test now runs slower! Not thirteen (or nineteen) hours slow but slow enough to be seriously annoying. I was able to isolate the slowdown to some Lua modules (that I wrote cough cough—I didn't think the changes were that bad).

We do development primarily on Macs. My first thought was to either confirm or deny that the Mac was the issue. I ran tests on a Linux laptop I have for work and nope—the slowdown exists on that. I then tried to profile the code (to no conclusive result). I compared the code between the two versions (only five lines changed in the regression test, and all of those were small API changes like fsys.redirect(stdin,fsys.STDIN) becoming fsys.redirect(stdin,io.stdin)). I just could not figure out why the new code was slower.

So then I decided to just time things. I added code to time how long it took to run each test (and to limit the number of tests run—enough to show the issue, but not the entire 15,852 tests) and plot the results.

[Graph of the time to run each test from the old regression test and the new regression test.  The old one is faster.]

The X-axis is the test case number, the Y-axis is time in microseconds (and plotted on a logarithmic scale, otherwise, you wouldn't be able to make out any detail along the lower portion). The new regression test is in green, the old one in red. It's clear that there are a few dozen tests that take a full second to run in the new program, whereas the average (of both, aside from the outliers) seems to be around 5ms. So good, the problem clearly shows up.

Then I timed each section of running a test to see which section is sucking up the time, and eventually, I get this graph:

[Graph of the small segment of the test that is taking all the time.]

[The sharper eyed of you might have noticed the X-axis change—each test is run twice (for reasons). This means the second graph has twice as many entries. Yes, that means the full 15,852 tests are each run twice, for 31,704 requests. —Sean]

This represents the timing of this bit of code:

-- --------------------------------
-- Receive and process NOTIFY
-- --------------------------------

local remote,data = sipsock:recv(10)
if not data then
  return false,"no NOTIFY"
end

The only program that's different is the regression test, not the programs being tested. And it's not like I updated the version of Lua either—that didn't change. The third party code I'm using, yes, to some degree. But the underlying code to sipsock:recv() did not change. Yet it's this bit of code that is consistently slow for a consistent set of tests.

To say I'm a bit perplexed is a bit of an understatement.

Obligatory Picture

An abstract representation of where you're coming from]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.