Wednesday, April 22, 2020
The trouble of finding a small memory leak
The last time I mentioned GLV-1.12556 it was in reference to a bug that prevented large files from being transferred. I neglected to mention that I fixed the bug back in November where I was improperly checking a return code. Code fixed, issue no more.
But a problem I am seeing now is the ever growing memory usage of the server.
I've written other servers that don't exhibit this issue so it's not Lua per se.
I use valgrind
to check and it does appear to be LibreSSL,
but the output from valgrind
isn't entirely helpful,
as you can see from this snippit:
==27306== 96 bytes in 8 blocks are indirectly lost in loss record 8 of 21 ==27306== at 0x4004405: malloc (vg_replace_malloc.c:149) ==27306== by 0x429E7FD: ??? ==27306== by 0x429E918: ??? ==27306== by 0x429F00A: ??? ==27306== by 0x435BF54: ??? ==27306== by 0x422D548: ??? ==27306== by 0x4236B14: ??? ==27306== by 0x420FD9C: ??? ==27306== by 0x421021B: ??? ==27306== by 0x420D3D0: ??? ==27306== by 0xD0808A: pthread_once (in /lib/tls/libpthread-2.3.4.so) ==27306== by 0x420671D: ???
Some functions are decoded by their address,
but not all.
It doesn't help that LibreSSL is loaded dynamically so the addresses change from run to run.
I want a stacktrace of each call to malloc()
(and related functions) but I'd rather not have to modify the code just to get this information.
Fortunately,
I run Linux,
and on Linux,
I can take advantage of LD_PRELOAD
and insert my own
hacked versions of malloc()
(and co.) to record the backtraces without having to rewlink everything.
The simplest thing that could work is just to print a message with the backtrace,
and so that's what I did.
Given this simple program:
#include <stdio.h> #include <stdlib.h> int main(void) { void *p = realloc(NULL,50); void *q = calloc(1,100); void *r = malloc(150); void *s = realloc(p,200); free(q); free(r); exit(0); }
I can now get the following output:
! (nil) 0x96dd008 50 ./y [0x8048464] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93] ./y [0x80483b5] + 0x96dd3c8 100 ./y [0x8048476] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93] ./y [0x80483b5] + 0x96dd430 150 ./y [0x8048489] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93] ./y [0x80483b5] ! 0x96dd008 0x96dd4d0 200 ./y [0x804849f] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93] ./y [0x80483b5] - 0x96dd3c8 - 0x96dd430
Allocations from malloc()
and calloc()
are signified by a “+” sign
(followed by the address, size and callstack);
allocations from realloc()
are signified by a “!” sign
(followed by the previous and new address, new size and callstack);
calls to free()
are signified by a “-” sign
(which just contains the address—I don't care about the callstack for this call).
Some post processing of this output can flag allocations that don't have a corresponding free call:
0x96dd4d0 200 ./y [0x804849f] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93] ./y [0x80483b5] Total memory 200 Total records 1
It's not perfect,
but I gives a bit more information than valgrind
does,
as we can see from what I think is the same call as the above valgrind
example showed:
0x98861f0 12 /home/spc/JAIL/lib/libcrypto.so.45(lh_insert+0xea) [0x380156] /home/spc/JAIL/lib/libcrypto.so.45(OBJ_NAME_add+0x70) [0x3854c0] /home/spc/JAIL/lib/libcrypto.so.45(EVP_add_cipher+0x2d) [0x371889] /home/spc/JAIL/lib/libcrypto.so.45 [0x366d3f] /lib/tls/libpthread.so.0(__pthread_once+0x8b) [0xd0808b] /home/spc/JAIL/lib/libcrypto.so.45 [0x2ff307] /lib/tls/libpthread.so.0(__pthread_once+0x8b) [0xd0808b] /home/spc/JAIL/lib/libssl.so.47(OPENSSL_init_ssl+0x4b) [0x148ebb] /home/spc/JAIL/lib/libtls.so.19 [0xfa63ba] /lib/tls/libpthread.so.0(__pthread_once+0x8b) [0xd0808b] /usr/local/lib/lua/5.3/org/conman/tls.so(luaopen_org_conman_tls+0x18) [0x21871e] lua [0x804ef6a] lua [0x804f264] lua [0x804f2be] lua(lua_callk+0x37) [0x804d0eb] lua [0x8068deb] lua [0x804ef6a] lua [0x8058ab5] lua [0x804f27d] lua [0x804f2be] lua(lua_callk+0x37) [0x804d0eb] lua [0x8068deb] lua [0x804ef6a] lua [0x8058ab5] lua [0x804f27d] lua [0x804f2be] lua [0x804d146] lua [0x804e8ac] lua [0x804f6ec] lua(lua_pcallk+0x60) [0x804d1a8] lua [0x804b0e4] lua [0x804baba] lua [0x804ef6a] lua [0x804f264] lua [0x804f2be] lua [0x804d146] lua [0x804e8ac] lua [0x804f6ec] lua(lua_pcallk+0x60) [0x804d1a8] lua(main+0x55) [0x804bb91] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93] lua [0x804ae99]
I can see that this particular piece of leaked memory was allocated by tls_init()
(by tracking down what the call at address luaopen_org_conman_tls+0x18
corresponds to).
But this leads to another issue with tracking down these leaks—I don't care about allocations durring initialization of the program.
Yes,
it's technically a memory leak,
but it happens once during program initialization.
It's the memory loss that happens as the program runs that is a larger concern to me.
So yes,
there's some 40K or so lost at program startup.
What's worse is that it's 40K over some 2,188 allocations!
I did see a further leak when I made several requests back to back—about 120 bytes over 8 more allocations,
and it's those that have me worried—a slow leak.
And given that the addresses of the heap and dynamically loaded functions change from run to run,
it makes it very difficult to filter out those 2,188 allocations from initialization to find the 8 other allocations that are leaking.
It would be easier to track down if I could LD_PRELOAD
the modified malloc()
et al. into the process
after intialization,
but alas,
that is way more work
(let's see—I need to write a program to stop the running process,
inject the modified malloc()
et al. into mapped but othersise unused executable memory,
then patch the malloc()
et al. vectors to point to the new code,
and resume the program; then possibly reverse said changes when you no longer want to record the calls—doable but a lot of work)
just to track down a bug in code that isn't even mine.
Sigh.