The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Saturday, January 21, 2006

Are you tired of network tarpits yet?

You would not believe how hard it was to write a binary search that returned the correct index for a missing record in an array.

Some notes on a binary search implementation

A week later, and I finally have it working.

One technique used to debug a program is to have another program that does the same thing, but implemented using a different method or language or both. And I did. I ran the Perl program I had over the 1.1G log file, then ran ltpstat over the same log file and got two different results.

Not good.

ltpstat returned 2% more connections than the Perl script. Getting a dump from the currently running version on the LaBrea system and cleaning the output showed a 2% difference again.

So I spent the past week trying to track down the problem. It was obvious that ltpstat was storing duplicate records, but why was a different matter. My testing sample of about 1,100 connections is apparently too small to completely test the program, so I had to test using the 1.1G log file which has approximately 230,000 connections.

To help debug this problem, I wrote a linear search and would call it as well as the binary search. If both agreed, then I would return the information, otherwise, I would log the discrepency, do the search again, then exit. The reason for doing the search a second time? So I could set a breakpoint there, and let the program run for a couple of hours until it triggered. Then I could step through both searches to see where the problem was.

Yup, each run took several hours to trigger the bug.

I ended up testing four different binary search routines (including the original one I thought worked, plus one I modified from The Standard C Library, plus two other versions I wrote) before sitting down and working through things on paper.

And I still missed corner cases.

But finally, I tested my final version it against the Perl script and only had 122 discrepencies out of some 230,000 records (or 5% of 5%—too small for me to worry about after spending a week on this).


I took a snapshot of the currently running version (which had been running for a bit over three days now), cleansed the output of duplicates, and the final tally was 416,230 connections from 12,911 unique IPs. Again, nothing surprising about the ports being attacked:

Top 10 ports captured by Labrea in the past 3 days
Port # Port description # connections
Port # Port description # connections
139 NetBIOS Session Service 160,799
135 Microsoft-RPC service 108,958
445 Microsoft-DS Service 67,506
80 Hypertext Transfer Protocol 23,921
4899 Remote Administration 9,225
22 Secure Shell Login 7,253
1433 Microsoft SQL Server 6,503
8080 Hypertext Transfer Protocol—typical alternative port 3,717
3128 Squid HTTP Proxy 3,329
1080 W32.Mydoom.F@mm worm 3,150

And again, the Microsoft specific ports account for 81% of the scans. I'll need to discuss with Smirk about blocking those ports in the core router. If nothing else, LaBrea is giving me an indication of which ports to block.

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.