The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, April 22, 2020

The trouble of finding a small memory leak

The last time I mentioned GLV-1.12556 it was in reference to a bug that prevented large files from being transferred. I neglected to mention that I fixed the bug back in November where I was improperly checking a return code. Code fixed, issue no more.

But a problem I am seeing now is the ever growing memory usage of the server. I've written other servers that don't exhibit this issue so it's not Lua per se. I use valgrind to check and it does appear to be LibreSSL, but the output from valgrind isn't entirely helpful, as you can see from this snippit:

==27306== 96 bytes in 8 blocks are indirectly lost in loss record 8 of 21
==27306==    at 0x4004405: malloc (vg_replace_malloc.c:149)
==27306==    by 0x429E7FD: ???
==27306==    by 0x429E918: ???
==27306==    by 0x429F00A: ???
==27306==    by 0x435BF54: ???
==27306==    by 0x422D548: ???
==27306==    by 0x4236B14: ???
==27306==    by 0x420FD9C: ???
==27306==    by 0x421021B: ???
==27306==    by 0x420D3D0: ???
==27306==    by 0xD0808A: pthread_once (in /lib/tls/libpthread-2.3.4.so)
==27306==    by 0x420671D: ???

Some functions are decoded by their address, but not all. It doesn't help that LibreSSL is loaded dynamically so the addresses change from run to run. I want a stacktrace of each call to malloc() (and related functions) but I'd rather not have to modify the code just to get this information. Fortunately, I run Linux, and on Linux, I can take advantage of LD_PRELOAD and insert my own hacked versions of malloc() (and co.) to record the backtraces without having to rewlink everything. The simplest thing that could work is just to print a message with the backtrace, and so that's what I did. Given this simple program:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
  void *p = realloc(NULL,50);
  void *q = calloc(1,100);
  void *r = malloc(150);
  void *s = realloc(p,200);
  
  free(q);
  free(r);
  exit(0);
}

I can now get the following output:

! (nil) 0x96dd008 50    ./y [0x8048464] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93]   ./y [0x80483b5]
+ 0x96dd3c8 100 ./y [0x8048476] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93]   ./y [0x80483b5]
+ 0x96dd430 150 ./y [0x8048489] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93]   ./y [0x80483b5]
! 0x96dd008 0x96dd4d0 200       ./y [0x804849f] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93]   ./y [0x80483b5]
- 0x96dd3c8
- 0x96dd430

Allocations from malloc() and calloc() are signified by a “+” sign (followed by the address, size and callstack); allocations from realloc() are signified by a “!” sign (followed by the previous and new address, new size and callstack); calls to free() are signified by a “-” sign (which just contains the address—I don't care about the callstack for this call). Some post processing of this output can flag allocations that don't have a corresponding free call:

0x96dd4d0       200
        ./y [0x804849f]
        /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93]
        ./y [0x80483b5]

Total memory    200
Total records   1

It's not perfect, but I gives a bit more information than valgrind does, as we can see from what I think is the same call as the above valgrind example showed:

0x98861f0	12
	/home/spc/JAIL/lib/libcrypto.so.45(lh_insert+0xea) [0x380156]
	/home/spc/JAIL/lib/libcrypto.so.45(OBJ_NAME_add+0x70) [0x3854c0]
	/home/spc/JAIL/lib/libcrypto.so.45(EVP_add_cipher+0x2d) [0x371889]
	/home/spc/JAIL/lib/libcrypto.so.45 [0x366d3f]
	/lib/tls/libpthread.so.0(__pthread_once+0x8b) [0xd0808b]
	/home/spc/JAIL/lib/libcrypto.so.45 [0x2ff307]
	/lib/tls/libpthread.so.0(__pthread_once+0x8b) [0xd0808b]
	/home/spc/JAIL/lib/libssl.so.47(OPENSSL_init_ssl+0x4b) [0x148ebb]
	/home/spc/JAIL/lib/libtls.so.19 [0xfa63ba]
	/lib/tls/libpthread.so.0(__pthread_once+0x8b) [0xd0808b]
	/usr/local/lib/lua/5.3/org/conman/tls.so(luaopen_org_conman_tls+0x18) [0x21871e]
	lua [0x804ef6a]
	lua [0x804f264]
	lua [0x804f2be]
	lua(lua_callk+0x37) [0x804d0eb]
	lua [0x8068deb]
	lua [0x804ef6a]
	lua [0x8058ab5]
	lua [0x804f27d]
	lua [0x804f2be]
	lua(lua_callk+0x37) [0x804d0eb]
	lua [0x8068deb]
	lua [0x804ef6a]
	lua [0x8058ab5]
	lua [0x804f27d]
	lua [0x804f2be]
	lua [0x804d146]
	lua [0x804e8ac]
	lua [0x804f6ec]
	lua(lua_pcallk+0x60) [0x804d1a8]
	lua [0x804b0e4]
	lua [0x804baba]
	lua [0x804ef6a]
	lua [0x804f264]
	lua [0x804f2be]
	lua [0x804d146]
	lua [0x804e8ac]
	lua [0x804f6ec]
	lua(lua_pcallk+0x60) [0x804d1a8]
	lua(main+0x55) [0x804bb91]
	/lib/tls/libc.so.6(__libc_start_main+0xd3) [0xba4e93]
	lua [0x804ae99]

I can see that this particular piece of leaked memory was allocated by tls_init() (by tracking down what the call at address luaopen_org_conman_tls+0x18 corresponds to). But this leads to another issue with tracking down these leaks—I don't care about allocations durring initialization of the program. Yes, it's technically a memory leak, but it happens once during program initialization. It's the memory loss that happens as the program runs that is a larger concern to me.

So yes, there's some 40K or so lost at program startup. What's worse is that it's 40K over some 2,188 allocations! I did see a further leak when I made several requests back to back—about 120 bytes over 8 more allocations, and it's those that have me worried—a slow leak. And given that the addresses of the heap and dynamically loaded functions change from run to run, it makes it very difficult to filter out those 2,188 allocations from initialization to find the 8 other allocations that are leaking. It would be easier to track down if I could LD_PRELOAD the modified malloc() et al. into the process after intialization, but alas, that is way more work (let's see—I need to write a program to stop the running process, inject the modified malloc() et al. into mapped but othersise unused executable memory, then patch the malloc() et al. vectors to point to the new code, and resume the program; then possibly reverse said changes when you no longer want to record the calls—doable but a lot of work) just to track down a bug in code that isn't even mine.

Sigh.

Update on Thursday, April 23RD, 2020

I think I may have found the leak.

Obligatory Picture

An abstract representation of where you're coming from]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.