Last month, Mark hired me on as a consultant to help him profile a program he's been writing. And while I can't describe the program (or the project the program is for), I can, however, describe the issues I've had with profiling this program.
Normally to profile a program, you compile the program using special
compiler options (with
gcc this is the
that instrument the program with special code to record not only how many
times each function is called, but how much time is spent running each
function. This information is saved when the program terminates, and you
run another special program to decode this output into an easy to read format. Then you use that
information to boost the performance of
It's pretty straightforward.
This project, on the other hand, wasn't so straightforward. The
primary issue—it's a multiprocess program; that is, it calls
fork() to create child processes that do the actual work. It's
the child processes that need to be profiled, but it's difficult to actually
get the profile information from the child processes due to the way
the profiler works (and both the GNU profiler (which I'm not allowed to use
due to licensing fears) and the Sun Studio 12 profiler (which is the
development platform for the project) work simularly, so this issue affects
Problem one—the output. The program runs and when it exits, the
accumulated data is written to a file. This file is named
gmon.out if I were using GNU). In this
case, the main program starts, creates several child processes. The ouput
file is only generated when a process ends, and the only time a process ends
in this project is when you explicitely stop the main program. This results
mon.out being overwritten by each child as they end, then
overwritten again when the main process ends. So all I end up with is
profile information for the main process, which tells me that the main
processing loop took 99% of the time with only 0.01 seconds of CPU time (in
other words—the parent process did nothing noteworthy). And there's no
option in either the compiler or at runtime, to change the output file.
Or is there?
mon.out is generated in the current working
directory of the running process. Change the working directory of the
process, and the file ends up in the new working directory. So I modify the
program such that each child process creates a new working directory based
on the child's process ID and try again.
I ended up with one
mon.out file in the working directory of
the main process, and a bunch of empty directories. This leads to the
second problem with profiling—you only get the output when the process
exit() (or returns from
_exit() (there is a
difference between the two).
And replacing the calls to
caused the program to hang (Mark even had a comment in the code about the C
fork() badly in the case of
So that pretty much killed using the compiler profiler options for this project.
Or did it?
The code is on a Sun server, which means it comes with DTrace, which is an incredible tracing facility that can be used to profile an application! Without compiling a special version of the program! Heck, you can profile any running process at any time!
It's a neat facility.
Just by using some sample scripts from the DTraceToolkit
and a few examples from the Solaris Dynamic Tracing
Guide, I was able to provide Mark with enough information to nearly
double the performance of the program (major culprit—a ton of pointless
strcpy() calls in a third party library being used, but that's
about all I can say about that).
I was fortunate in that DTrace existed; between samping the program
counter, recording the number of standard library calls made, and
selectively checking the call stack to a few questionable calls (for
ferror() if you can
believe it, and tracking down the few hundred thousand calls to
strcpy()) of selected child processes, I was able to profile
this multiprocess program (and each process is multithreaded—a fact I
didn't realize until later).
And if DTrace didn't exist? Well … there's
the profiling equivalent to
printf() debugging I could have