Tuesday, October 30, 2007
More thoughts on optimizing a greylist daemon
I ran the updated stress test on a faster (2.6GHz machine) and managed to get some impressive results.
There were three different ways I ran the test. One option had the stress program send a request and wait for a reply. This was by far the slowest of the tests, but the most reliable (in terms of actually processing every request) with the greylist daemon handling between 4,000 to 6,300 tuples per second. Another option has a separate process waiting for the replies and that goes faster, between 11,000 and 17,000 tuples per second, but drops a ton of requests (on the order of 70%). The last option doesn't even bother with replies. This does both the best and the worst—30,000 tuples per second, but it drops something like 90%.
So, the program can easily handle about 5,000 requests per second on a nice server, which is probably way more than most SMTP servers can handle (and it's much nicer than the 130/second I thought it could handle).
I profiled the program again, and this time, got actual results I could use:
% time | cumulative seconds | self seconds | calls | self Ts/call | total Ts/calls | name |
---|---|---|---|---|---|---|
% time | cumulative seconds | self seconds | calls | self Ts/call | total Ts/calls | name |
21.24 | 0.48 | 0.48 | 2260060 | 0.00 | 0.00 | crc32 |
14.38 | 0.81 | 0.33 | 443203 | 0.00 | 0.00 | tuple_search |
11.51 | 1.07 | 0.26 | 565012 | 0.00 | 0.00 | ip_match |
8.85 | 1.27 | 0.20 | 565012 | 0.00 | 0.00 | type_graylist |
7.97 | 1.45 | 0.18 | 1 | 0.18 | 2.20 | mainloop |
6.64 | 1.60 | 0.15 | 565015 | 0.00 | 0.00 | send_packet |
4.87 | 1.71 | 0.11 | 7648182 | 0.00 | 0.00 | tuple_cmp_ift |
4.87 | 1.82 | 0.11 | 565012 | 0.00 | 0.00 | graylist_sanitize_req |
3.98 | 1.91 | 0.09 | 1761756 | 0.00 | 0.00 | edomain_search |
3.54 | 1.99 | 0.08 | 2637054 | 0.00 | 0.00 | edomain_cmp |
3.10 | 2.06 | 0.07 | 421359 | 0.00 | 0.00 | tuple_add |
2.21 | 2.11 | 0.05 | 565012 | 0.00 | 0.00 | send_reply |
2.21 | 2.16 | 0.05 | 1 | 0.05 | 0.05 | whitelist_dump_stream |
0.89 | 2.18 | 0.02 | 565127 | 0.00 | 0.00 | ipv4 |
Again, nothing terribly surprising here, except for the code
gcc
generated for the crc32()
function (two lines
of C code, one of which is while(size--)
), but I used the
default compiler settings; if it really bothers me, I can up the compiler
settings and see what I get.