Wednesday, Debtember 21, 2022
Unit test this
Or, “What is a ‘unit test,’ part II”
I saw a decent answer to my question which makes sense for C. Another decent (if a bit vague) answer was:
So to answer Sean's question, a unit test is that which requires the least amount of work to setup but is able to reduce the need for coverage of the larger system. Whatever you want to consider a "unit" is up to you and the language you're using.
I left off my previous entry pointing to a function that I would love to have seen someone else “unit test,” but alas, no one did. But I always had plans on going all “The Martian” on the code and “unit test the XXXX out of it.”
So here's the code in question:
/*********************************************** * * Copyright 2021 by Sean Conner. All Rights Reserved. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version 2 * of the License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * * Comments, questions and criticisms can be sent to: sean@conman.org * ************************************************/ #include <stdbool.h> #include <stdlib.h> #include <string.h> #include <errno.h> #include <assert.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/stat.h> #include <unistd.h> #include <fcntl.h> #include <syslog.h> #include <sysexits.h> /************************************************************************/ bool run_hook(char const *tag,char const *argv[]) { assert(tag != NULL); assert(argv != NULL); assert(argv[0] != NULL); pid_t child = fork(); if (child == -1) { syslog(LOG_ERR,"%s='%s' fork()='%s'",tag,argv[0],strerror(errno)); return false; } else if (child == 0) { extern char **environ; int devnull = open("/dev/null",O_RDWR); if (devnull == -1) _Exit(EX_UNAVAILABLE); if (dup2(devnull,STDIN_FILENO) == -1) _Exit(EX_OSERR); if (dup2(devnull,STDOUT_FILENO) == -1) _Exit(EX_OSERR); if (dup2(devnull,STDERR_FILENO) == -1) _Exit(EX_OSERR); for (int fh = STDERR_FILENO + 1 ; fh <= devnull ; fh++) if (close(fh) == -1) _Exit(EX_OSERR); execve((char *)argv[0],(char **)argv,environ); _Exit(EX_UNAVAILABLE); } else { int status; if (waitpid(child,&status,0) != child) { syslog(LOG_ERR,"%s='%s' waitpid()='%s'",tag,argv[0],strerror(errno)); return false; } if (WIFEXITED(status)) { if (WEXITSTATUS(status) != 0) { syslog(LOG_ERR,"%s='%s' status=%d",tag,argv[0],WEXITSTATUS(status)); return false; } } else { syslog(LOG_ERR,"%s='%s' terminated='%s'",tag,argv[0],strsignal(WTERMSIG(status))); return false; } } return true; } /************************************************************************/
As you can see, it's one function, in one file, with the only dependencies
being the operating system. So this should be the “perfect unit” to write
some “unit tests” for. The code does replicate a bit of the standard C
function system()
, so why not use system()
in the
first place? The answer comes from the manual page for Linux:
Do not use
system()
from a privileged program (a set-user-ID or set-group-ID program, or a program with capabilities) because strange values for some environment variables might be used to subvert system integrity. For example,PATH
could be manipulated so that an arbitrary program is executed with privilege. Use theexec(3)
family of functions instead, but notexeclp(3)
orexecvp(3)
(which also use thePATH
environment variable to search for an executable).
This function runs as part of a set-user-ID program (mod_blog
in particular,
for reasons beyond the scope of this entry) so no system()
for
me. Also, I avoid having to construct a command string that might have failed
to properly escape the filename to avoid complications with the shell's use
of certain characters. And it's not like the function was hard for me to
write. I've done functions like this before, and it worked the first time
without issue when I wrote it (and the small changes to it since have been a
simplification of the parameters, and changes to the logging messages). It's
also not a very long function (I'm sorry Ron Jefferies, but 14
lines of code isn't “a lot of code”).
The reason I wanted some unit test proponents to look at this code is that it involves quite a bit of interaction with the operating system in C, a not-very-popular programming language these days, and I was curious as to the level of “unit testing“ that would be done. No bites, but my gut feeling is that a “unit test proponent” faced with this code would just throw two scripts to it, one to return successfully:
int main(void) { return 0; }
and one to return failure:
int main(void) { return 1; }
and call it “battle tested.” The two test cases themselves are pretty easy to write:
#include <stdbool.h> #include <stdlib.h> #include <stdio.h> #include "tap14.h" extern bool run_hook (char const *,char const **); int main(void) { tap_plan(2,NULL); tap_assert( run_hook("script",(char const *[]){ "./success" , NULL }),"success script"); tap_assert(!run_hook("script",(char const *[]){ "./failure" , NULL }),"failure script"); return tap_done(); }
(I'm using my own testing framework based on TAP. I wrote my own to be as minimal as possible to get the job done—other TAP frameworks for C I looked at were too overblown for my tastes.)
An improvement would be to add a test script that terminates via a signal. It's again easy enough to write that script:
#include <signal.h> int main(void) { raise(SIGINT); return 1; }
and the appropriate test:
tap_assert(!run_hook("script",(char const *[]){ "./terminate" , NULL }),"terminate script");
But that only tests half the code. How do I know I didn't mess up the codepath in the child process before I execute the script in question? At The Enterprise, it was expected our tests cover about 70% of the code at least— I'm short of that target here. And as I say, I'm aiming to “unit test the XXXX out of this” and get 100% code coverage, because shouldn't that be the goal of “unit testing?”
But to achieve that target, I'm going to have to deal with “failing” a bunch of existing functions, and according to my interprestation of “A Set of Unit Testing Rules,” if I'm not mocking, I don't have a “unit test.” So I have to mock some system calls.
And here is where I hit a problem—to do so will invoke the dreaded
“undefined behavior of C.” Seriously–if I provide my own function for, say,
dup2()
, I am technically invoking undefined behavior of the C
machine (this incredibly long flame war on the Lua mailing list of
all places, goes into the gory details behind that statement). Now granted,
certain tool chains on certain operating systems allow one to override
functions, but you can't rely upon this behavior in general. Given that I'm
doing all of this on Linux, and Linux in general allows this, I can proceed
(carefully) with mocking system functions.
That should be straightforward enough. The mocked open()
function:
static int err_open; static int ret_open; int open(char const *pathname,int flags) { (void)pathname; (void)flags; if (err_open != 0) errno = err_open; return ret_open; // XXX had bug here }
This should be fine for my purposes as I don't actually need to read from the file. If I really needed to call into the original function, this might work:
static int err_open; int myopen(char const *pathname,int flags) { if (err_open == 0) return open(pathname,flags,0); errno = err_open; return -1; } #define open myopen
But as the “A Set of Unit Testing Rules” article states, “A test is not a
unit test if: it touches the file system.” So the above isn't a “true mock,”
and I shall continue with my “true mocked” function instead. I can continue
with similar implementations for the functions dup2()
,
close()
and waitpid()
. Unfortunately, there are
three functions that may present some interesting challenges:
fork()
, execve()
, and _Exit()
. The
first returns twice (kind of—if you squint and look sideways), the second
only returns if there's an error, and the third never returns.
Now looking over the implementation of the function I'm testing, and
thinking about things, I could do a similar implementation for
fork()
—the returning twice thing is where it returns once in the
parent process, and once in the child process, but I can treat that as just a
normal return, at least for purposes of testing. For execve()
, I
can only test the error path here as the script being “run” isn't being run.
That just leaves _Exit()
as the final function to mock. And for
that one, I wrap the entire call to run_hook()
(the function
being “unit tested”) around setjmp()
and longjmp()
to simulate the not-returning aspect of _Exit()
. So a test of
the close()
codepath would look like:
static bool X(char const *tag,char const *argv[]) { volatile int rc = setjmp(buf_exit); if (rc != 0) return false; return run_hook(tag,argv); } int main(void) { /* ... */ ret_open = 4; err_dup2 = 0; ret_dup2 = 0; bad_dup2 = -1; err_close = EIO; ret_close = -1; tap_assert(!X("script",(char const *[]){ "./success" , NULL }),"close() fail"); /* ... */ return tap_done(); }
I got all the test cases written up and all 11 tests pass:
TAP version 14 1..11 ok 1 - success script ok 2 - failure script ok 3 - terminate script ok 4 - fork() fail ok 5 - open() fail ok 6 - dup2(stdin) fail ok 7 - dup2(stdout) fail ok 8 - dup2(stderr) fail ok 9 - close() fail ok 10 - execve() fail ok 11 - waitpid() fail
A successful “unit test” with 100% code coverage. But I'm not happy with this. First off, I don't get the actual logging information for each test case. All I get is:
Dec 21 19:34:10 user err /dev/log test_run_hook script='./failure' status=1 Dec 21 19:34:10 user err /dev/log test_run_hook script='./terminate' terminated='Interrupt' Dec 21 19:34:10 user err /dev/log test_run_hook script='./success' fork()='Cannot allocate memory' Dec 21 19:34:10 user err /dev/log test_run_hook script='./success' waitpid()='No child processes'
and not
Dec 21 19:04:10 user err /dev/log test_run_hook script='./failure' status=1 Dec 21 19:04:10 user err /dev/log test_run_hook script='./terminate' terminated='Interrupt' Dec 21 19:04:10 user err /dev/log test_run_hook script='./success' fork()='Cannot allocate memory' Dec 21 19:04:10 user err /dev/log test_run_hook script='./success' status=69 Dec 21 19:04:10 user err /dev/log test_run_hook script='./success' status=71 Dec 21 19:04:10 user err /dev/log test_run_hook script='./success' status=71 Dec 21 19:04:10 user err /dev/log test_run_hook script='./success' status=71 Dec 21 19:04:10 user err /dev/log test_run_hook script='./success' status=71 Dec 21 19:04:10 user err /dev/log test_run_hook script='./success' status=69 Dec 21 19:04:10 user err /dev/log test_run_hook script='./success' waitpid()='No child processes'
(And no! I am not checking that
syslog()
got the right message in the test cases—been there,
done that and all I got was a stupid tee-shirt and emotional scars. It's easy
enough to just manually check after the test runs, at least for this
entry.)
It just doesn't feel right to me that I'm testing in a faked
environment. No, to get a better “unit test” I'm afraid I'm going to have to
keep invoking undefined C behavior that is allowed by Linux, and interpose
our functions by using LD_PRELOAD
to override the functions. And
I can set things up so that I can still call the original function when I
want it to succeed. So all that needs to be done is write a shared object
file with my versions of the functions, and include this function:
static pid_t (*___fork) (void); static int (*___open) (char const *,int,mode_t); static int (*___dup2) (int,int); static int (*___close) (int); static int (*___execve) (char const *,char *const [],char *const []); static pid_t (*___waitpid)(pid_t,int *,int); __attribute__((constructor)) void init(void) { ___fork = dlsym(RTLD_NEXT,"fork"); ___open = dlsym(RTLD_NEXT,"open"); ___dup2 = dlsym(RTLD_NEXT,"dup2"); ___close = dlsym(RTLD_NEXT,"close"); ___execve = dlsym(RTLD_NEXT,"execve"); ___waitpid = dlsym(RTLD_NEXT,"waitpid"); }
(I include all three parameters to open()
even though the
last one is optional—I don't want to have to deal with the variable argument
machinery with C—this should work “just fine on my machine”—I'm already into
territory that C formally forbids. I'm using triple leading underscores
because single and double leading underscores are reserved to the C compiler
and implementation, but nothing is mentioned about three leading
underscores.)
Now, how to get information to my replacement functions about when to
fail. I thought about it, and while there is a way to do it with global
variables, it gets complicated and I'd rather do this as simply as possible.
I figured I could sneak variables through to my replacement functions via
putenv()
, getenv()
and unsetenv()
.
This will make the close()
failed test case look like:
putenv((char *)"SPC_CLOSE_FAIL=5"); /* EIO */ tap_assert(!run_hook("script",(char const *[]){ "./success" , NULL }),"close() fail"); unsetenv("SPC_CLOSE_FAIL");
And the corresponding close()
function is:
int close(int fd) { char *fail = getenv("SPC_CLOSE_FAIL"); if (fail == NULL) return (*___close)(fd); errno = (int)strtoul(fail,NULL,10); return -1; }
The other functions work simularly, and when run:
TAP version 14 1..11 ok 1 - success script ok 2 - failure script ok 3 - terminate script ok 4 - fork() fail ok 5 - open() fail ok 6 - dup2(stdin) fail ok 7 - dup2(stdout) fail ok 8 - dup2(stderr) fail ok 9 - close() fail ok 10 - execve() fail ok 11 - waitpid() fail
More importantly, since the functions can actually function as intended when I don't want them to fail, I get the full output I expect in the system logs. But per the “A Set of Unit Testing Rules” article, this is no longer a “proper unit test.”
I don't know. The more I try to understand “unit testing,” the less sense it makes to me. There is no real consensus as to what a “unit” is, and it seems strange (or in my more honest opinion, stupid) that we as programmers are not trusted to write code without tests, yet we're trusted to write a ton of code untested as long as such code is testing code. As I kept trying to impart to my former manager at The Enterprise before I left, the test case results aren't to be trusted as gospel (and it always was by him) because I didn't fully understand what the code was supposed to do (because the business logic in “Project Lumbergh” has become a literal mess of contradictory logic and communication among the team seriously broke down).
So maybe we're not supposed to “unit test” functions that involve input, output, or system interactions. Maybe we're supposed to “unit test” more “pure functions” and leave messy real world details to, oh, I don't know, some other form of testing. Okay, I have one final function that should be perfect for “unit testing.”
We shall see …