I thought that as long as I'm going to such lengths to get “push-button testing” implemented, I might as well mention some of the techniques I've used just in the off chance that it might help someone out there. The techniques I use are probably only relevant to the stuff I work on and it may not apply elsewhere, but it certainly can't hurt to mention it.
So I don't have unit tests (whatever they are) per se, but I do have what is referred to as a “regression test,” which tests “Project: Sippy-Cup,” “Project: Lumbergh” and “Project: Cleese.” The reason is that taken individually, each of those projects can be considered a “unit,” but to, say, test “Project: Lumbergh” alone would require something to act like “Project: Sippy-Cup” (which feed requests into “Project: Lumbergh”) and “Project: Cleese” (which is notified by “Project: Lumbergh” in some circumstances), so why not run those as well? “Project: Lumbergh” also talks to two different DNS servers for various information about a phone number, so when running it, I need something to respond back. I also need an endpoint for “Project: Cleese” to talk to, so what's one more process? Oh, “Project: Lumbergh” will also talk to cell phones, or at least expect a cell phone to request data in some circumstances, so I have a “simulated cell phone” running as well.
Each test case is now a separate file, which describes how to set up the data for the test (the two phone numbers, what names, what feature we're testing, etc) as well as what the expected results are (we get a name, or the reputation, or a different phone number, depending upon what's being tested). This way, we can have the regression test run one test, some of the tests, or “all the things!” The regression test will read in all the test cases and generate all the data required to run them. It then will start the seven programs with configurations generated on the fly, and start feeding SIP messages into the maelstrom, recording what goes on and checking the results as each test runs. And when a test fails, the test case information is recorded in an output file for later analysis.
So far, nothing out of the ordinary. That's pretty much how the previous regression test worked, except it generated all 15,852 test cases. But it's how I test some of the wierder border cases that I want to talk about.
First up—ensuring something that's not supposed to happen didn't happen. In some circumstances, “Project: Lumbergh” will notify “Project: Cleese,” and I have to make sure it happens when it's supposed to, and not when it's not supposed to. I've already mentioned part of my solution to this, but the updated version of that is: the regression test has a side channel to the fake endpoint that “Project: Cleese” talks to. Before each test, the regression test will send that component the test data and whether it should expect a request or not. The fake endpoint will simply record this data for later use. If a request is made for that particular test case, it will also be noted for later. Once the regression test has finished running all the tests and waited a few seconds for any last second requests to clear, it “runs” one more test—it queries the fake endpoint for a count of requests it actually received and compares it to the number the regression test thinks should have happened (and output success or failure). Upon termination of the regression test (as everything is being shut down), the fake endpoint will then go through its list of tests it received from the regression test, and record any discrpancies (a query that was supposed to happen didn't, or a query that wasn't supposed to happen, did). This is recorded in another file for later analysis (which is why I send over all the data to the fake endpoint—just to make it easier to see the conditions of the test in one place).
Second—“Project: Lumbergh” talking to multiple DNS servers. It will generally send out both requests at once, given the rather demanding timing constraints on us, so we have to support reply A coming before reply B, reply B coming before reply A, reply A timing out but getting reply B, and reply B tming out but getting reply A. How to test for those nightmare scenarios automatically? Oh, “Project: Lumbergh” also maintains a continuous “heartbeat” to these services, and if those replies don't get though, the servers will be taken out of rotation by “Project: Lumbergh” and once the last one is gone, “Project: Lumbergh” effectively shuts down. The nightmare just got worse.
Well, again, I have written my own fake endpoints for these services (not terribly hard as the data is fixed, and it's not like I'm going for speed here). And again, I added a side channel for the regression test to communicate to the fake endpoints. After starting up the these fake endpoints, the regression test informs the endpoints what entry is considered the “heartbeat” so no delay what so ever will ever be applied to that query. Then before any test is run, the regression test will inform the endpoints how long to delay the response—all the way from “no delay” to “don't even respond” (except for the “heartbeat”—that will always happen) as it's part of the testing data.
Yes, it all works. Yes, it's a pain to write. Yes, it's a bunch of code to test. No, I don't have XXXXXXX unit tests or regression tests for the regression test—I'm only willing to go so far.
I just hope we don't have to implement 100% test code coverage, because I'm not looking forward to forcing system calls to fail.
“Would love to hear about your prior development method. Did adopting the new practices have any upsides?”
[The following is a comment I made on Lobsters when asked about our development methods. I think it's good enough to save, and what better place to save it than this here blog. So here it is.]
First off, our stuff is a collection of components that work together. There are two front-end pieces (one for SS7 traffic, one for SIP traffic) that then talk to the back-end (that implements the business logic). The back-end makes parallel DNS queries  to get the required information, muck with the data according to the business logic, then return data to the front-ends to ultimately return the information back to the Oligarchic Cell Phone Companies. Since this process happens as a call is being placed we are on the Oligarchic Cell Phone Companies network, and we have some pretty short time constraints. And due to this, not only do we have some pretty severe SLAs, but any updates have to be approved 10 business days before deployment by said Oligarchic Cell Phone Companies. As a result, we might get four deployments per year .
So, now that you have some background, our development process. We do trunk based development (all work done on one branch, for the most part). We do NOT have continuous deployment (as noted above). When working, we developers (which never numbered more than three) would do local testing, either with the regression test, or another tool that allows us to target a particular data configuration (based off the regression test, which starts eight programs, five of which are just needed for the components being tested). Why not test just the business logic? Said logic is spread throughout the back-end process, intermixed with all the I/O it does (it needs data from multiple sources, queried at the same time).
Anyway, code is written, committed (main line), tested, fixed, committed (main line), repeat, until we feel it's good. And the “tested” part not only includes us developers, but also QA at the same time. Once it's deemed working (using both regression testing and manual testing), we then officially pass it over to QA, who walks it down the line from the QA servers, staging servers and finally (once we get permission from the Oligarchic Cell Phone Companies) into production, where not only devops is involved, but QA and the developer who's code is being installed (at 2:00 am Eastern, Tuesday, Wednesday or Thursday, never Monday or Friday).
Due to the nature of what we are dealing with, testing at all is damn near impossible (or rather, hideously expensive, because getting actual cell phone traffic through the lab environment involves, well, being a phone company (which we aren't), very expensive and hard to get equipment, and a very expensive and hard to get laboratory setup (that will meet FCC regulations, blah blah yada yada)) so we do the best we can. We can inject messages as if they were coming from cell phones, but it's still not a real cell phone, so there is testing done during deployment into production.
It's been a 10 year process, and it has gotten better until this past December.
Now it's all Agile, scrum, stories, milestones, sprints, and unit testing
über alles! As I told my new manager, why bother with a two week sprint
when the Oligarchic Cell Phone Companies have a two year sprint? It's not
like we ever did continuous deployment. Could more testing be done
automatically? I'm sure, but there are aspects that are very difficult to
test automatically . Also, more branch development. I wouldn't mind so
much this, except we're using SVN (for reasons that are mostly historical at
this point) and branching is … um … not as easy as in
git.  And
the new developer sent me diffs to ensure his work passes the tests. When I
asked him why didn't he check the new code in, he said he was told by the
new manager not to, as it could “break the build.” But we've broken the
build before this—all we do is just fix code and check it in . But no,
no “breaking the build,” even though we don't do continuous integration, nor
continuous deployment, and what deployment process we do have locks the
build number from Jenkins of what does get pushed (or considered “gold”).
Is there any upside to the new regime? Well, I have rewritten the regression test (for the third time now) to include such features as “delay this response” and “did we not send a notification to this process.” I should note that is is code for us, not for our customer, which, need I remind people, is the Oligarchic Cell Phone Companies. If anyone is interested, I have spent June and July blogging about this (among other things).
- Looking up NAPTR records to convert phone numbers to names, and another set to return the “reputation” of the phone number.
- It took us five years to get one SIP header changed slightly by the Oligarchic Cell Phone Companies to add a bit more context to the call. Five years. Continuous deployment? What's that?
- The original development happened in 2010, and the only developer at the time was a) very conservative, b) didn't believe in unit tests. The code is not written in a way to make it easy to unit test, at least, as how I understand unit testing.
- A prototype I wrote to get my head around parsing SIP messages that got deployed to production without my knowing it by a previous manager who was convinced the company would go out of business if it wasn't. This was six years ago. We're still in business, and I don't think we're going out of business any time soon.
- As I mentioned, we have multiple outstanding requests to various data sources, and other components that are notified on a “fire and forget” mechanism (UDP, but it's all on the same segment) that the new regime want to ensure gets notified correctly. Think about that for a second, how do you prove a negative? That is, something that wasn't supposed to happen (like a component not getting notified) didn't happen?
- I think we're the only department left using SVN—the rest of the
company has switched to
git. Why are we still on SVN? 1) Because the Solaris  build servers aren't configured to pull from
gityet and 2) the only redeeming feature of SVN is the ability to checkout a subdirectory, which given the layout of our repository, and how devops want the build servers configured, is used extensively. I did look into using
gitsubmodules, but man, what a mess. It totally doesn't work for us.
- Oh, did I neglect to mention we're still using Solaris because of SLAs? Because we are.
- Usually, it's Jenkins that breaks the build, not the code we checked in. Sometimes, the Jenkins checkout fails. Devops has to fix the build server  and try the call again.