Saturday, January 21, 2006
Are you tired of network tarpits yet?
You would not believe how hard it was to write a binary search that returned the correct index for a missing record in an array.
Some notes on a binary search implementation
A week later, and I finally have it working.
One technique used to debug a program is to have another program that does
the same thing, but implemented using a different method or language or both.
And I did. I ran the Perl program I had
over the 1.1G log file, then ran
ltpstat
over the same log file and got two different
results.
Not good.
ltpstat
returned 2% more connections than the Perl script.
Getting a dump from the currently running version on the LaBrea system and cleaning
the output showed a 2% difference again.
So I spent the past week trying to track down the problem. It was obvious
that ltpstat
was storing duplicate records, but why was
a different matter. My testing sample of about 1,100 connections is
apparently too small to completely test the program, so I had to test using
the 1.1G log file which has approximately
230,000 connections.
To help debug this problem, I wrote a linear search and would call it as well as the binary search. If both agreed, then I would return the information, otherwise, I would log the discrepency, do the search again, then exit. The reason for doing the search a second time? So I could set a breakpoint there, and let the program run for a couple of hours until it triggered. Then I could step through both searches to see where the problem was.
Yup, each run took several hours to trigger the bug.
I ended up testing four different binary search routines (including the original one I thought worked, plus one I modified from The Standard C Library, plus two other versions I wrote) before sitting down and working through things on paper.
And I still missed corner cases.
But finally, I tested my final version it against the Perl script and only had 122 discrepencies out of some 230,000 records (or 5% of 5%—too small for me to worry about after spending a week on this).
I took a snapshot of the currently running version (which had been running for a bit over three days now), cleansed the output of duplicates, and the final tally was 416,230 connections from 12,911 unique IPs. Again, nothing surprising about the ports being attacked:
Port # | Port description | # connections |
---|---|---|
Port # | Port description | # connections |
139 | NetBIOS Session Service | 160,799 |
135 | Microsoft-RPC service | 108,958 |
445 | Microsoft-DS Service | 67,506 |
80 | Hypertext Transfer Protocol | 23,921 |
4899 | Remote Administration | 9,225 |
22 | Secure Shell Login | 7,253 |
1433 | Microsoft SQL Server | 6,503 |
8080 | Hypertext Transfer Protocol—typical alternative port | 3,717 |
3128 | Squid HTTP Proxy | 3,329 |
1080 | W32.Mydoom.F@mm worm | 3,150 |
And again, the Microsoft specific ports account for 81% of the scans. I'll need to discuss with Smirk about blocking those ports in the core router. If nothing else, LaBrea is giving me an indication of which ports to block.