Monday, June 15, 2009
The only thing you can't do is move windows between the screens
Synergy lets you easily share a single mouse and keyboard between multiple computers with different operating systems, each with its own display, without special hardware. It's intended for users with multiple computers on their desk since each system uses its own monitor(s).
Redirecting the mouse and keyboard is as simple as moving the mouse off the edge of your screen. Synergy also merges the clipboards of all the systems into one, allowing cut-and-paste between systems. Furthermore, it synchronizes screen savers so they all start and stop together and, if screen locking is enabled, only one screen requires a password to unlock them all. Learn more about how it works.
Synergy is a neat program if you have multiple computers at your desk (plus they need to be networked).
It's also easy to configure. All I needed to configure Synergy (which is run on each computer) was:
#
# See that picture above? On the left is my laptop,
# blackbox. To its right is the monitor for my main
# Linux desktop computer, lucy. And to the right of
# lucy is the monitor for marvin, my Mac Mini.
#
# This configuration maps the spatial layout of the
# computers to allow Synergy to move the keyboard/mouse
# about the three computers.
#
section: screens
lucy:
blackbox:
marvin:
end
section: links
lucy:
left = blackbox
right = marvin
blackbox:
right = lucy
marvin:
left = lucy
end
section: aliases
lucy:
lucy.roswell.conman.org
blackbox:
blackbox.roswell.area51
marvin:
marvin.roswell.area51
Sean-Conners-Computer.local
end
There's more that I could configure (even remap the keys on the keyboard) but what I have above works just fine, and man, it sure beats having to hit ScrollLock-ScrollLock-0-1 or ScrollLock-ScrollLock-0-2 to switch computers. And the cut-and-paste between systems is just icing on the cake.
Wednesday, June 03, 2009
Waist deep in emails
I'm having a lot of fun writing the email indexing program, despite having to code around a few broken mbox files. I've also been surprised at what I've found so far (not in the “oh, I forgot about that email!” way but more in the “What the—?” way).
At first, I assumed that no
email header would be longer than 64K, but no, turns out that isn't big enough.
Turns out I have an email with a header that is 81,162 bytes in size, and it
has enough email addresses (in the Cc: header) to populate a
small mass-mailing list (and yes, it's spam).
I'm also tracking unique sets of headers and unique message bodies (via the SHA1 hashing function). There are 118 messages with the same body but with different headers and the amusing bit is that the emails in question wheren't spam! It's from a mailing list I used to run years ago where one of the members apparently changed his email address, and for a period of time each message that went out caused his automated system to send an update to the list.
And of course, he didn't unsubscribe his old email address.
Heh.
The tracking was done to keep from indexing duplicate emails (since my testing corpus is 1,600 mbox files, some of which may be backups—I don't know which ones though, which is part of the reason I'm writing this program) so in the end I should end up with a set of unique headers.
I got down to 16 emails with duplicate headers, but unique bodies.
That scared me.
A small digression: at this point, the program pulls each email out of the mbox file, and writes the headers into one file (the original, plus a few I add during processing, like the SHA1 hash results) and the body of the email into another file (my dad likes to send me photos and videos in email, so the bodies of those messages tend to be rather large, and I'm concentrating on the headers at the moment). I currently end up with about 50M of headers and almost a gigabyte-worth of email bodies. Now, continuing on …
I pick one of the duplicate hashes, scan for it, and then check the messages:
>find header_raw/ | xargs grep FFCC3E0BCBF960EBBEA583E77E51CE0CEB59E04D ./000008069:X-SHA1-Header: FFCC3E0BCBF960EBBEA583E77E51CE0CEB59E04D ./000026823:X-SHA1-Header: FFCC3E0BCBF960EBBEA583E77E51CE0CEB59E04D >grep X-SHA1 header_raw/000008069 header_raw/000026823 header_raw/000008069:X-SHA1-Header: FFCC3E0BCBF960EBBEA583E77E51CE0CEB59E04D header_raw/000008069:X-SHA1-Body: 5C823DD92D3DCDC5AD43953D72B1D60017A134D6 header_raw/000026823:X-SHA1-Header: FFCC3E0BCBF960EBBEA583E77E51CE0CEB59E04D header_raw/000026823:X-SHA1-Body: 85584F0167666BAA506E41A3D9ED927227F0FEF0 >
(Note: I can't just grep PATTERN * because there are simply
too many files (over 45,000) which exceeds the command line limit—that's
why I use find and xargs).
Okay, same headers, different body. Just what is going on here? I check the bodies:
>more body/000008069 Status: RO Accept All Major Credit Cards!!! Don't be fooled by the copycats. We are one of the original company's offering merchant credit card services for all kinds of business's. [sic]
This isn't looking good—it looks like my header parsing code is missing a header. What about the other email?
>more body/000026823 Status: RO Content-Length: 2815 Lines: 104 Accept All Major Credit Cards!!! Don't be fooled by the copycats. We are one of the original company's offering merchant credit card services for all kinds of business's. [sic]
Okay, check the mbox files to see what's messing up the header parsing. What I find actually reassures me:
From cherylg1582@msn.com Wed Dec 12 14:13:00 2001
Return-Path: <cherylg1582@msn.com>
Received: from gig.armigeron.com ([204.29.162.10])
by conman.org (8.8.7/8.8.7) with ESMTP id OAA06543
for <spc.wopr@conman.org>; Wed, 12 Dec 2001 14:12:59 -0500
Received: from mercury.aibusiness.net (emi.net [208.10.128.2]
(may be forged))
by gig.armigeron.com (8.11.0/8.11.0) with ESMTP id fBCJ8Aa31356
for <spc@armigeron.com>; Wed, 12 Dec 2001 14:08:10 -0500
Received: from domainmail.ionet.net (domainmail.ionet.net [206.41.128.18])
by mercury.aibusiness.net (8.9.3/8.9.3) with ESMTP id NAA19835
for <spc@emi.net>; Wed, 12 Dec 2001 13:52:26 -0500
Received: from kqyfqkpby.motor.com (r145h250.afo.net [209.149.145.250]
(may be forged))
by domainmail.ionet.net (8.9.1a/8.7.3) with SMTP id MAA02841;
Wed, 12 Dec 2001 12:38:11 -0600 (CST)
Date: Wed, 12 Dec 2001 12:38:11 -0600 (CST)
Message-Id: <200112121838.MAA02841@domainmail.ionet.net>
From: "griffin" <griffinfpzwrhlllngc@aol.com>
Subject: No fee! Accept Credit Cards for the Holidays! (bbjlm)
Reply-To: elicasabona1787@mailexcel.com
MIME-Version: 1.0
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD NSCPCD47 (Win98; I)
Content-Type: text/plain
Status: RO
Accept All Major Credit Cards!!!
It wasn't my code (thank God! The parsing code is getting a bit convoluted at this point), but some clueless spammer trying to add additional headers in the body of the message (the other one was the same). So I'll assume the other 14 “duplicates” are similar in nature—spammers trying to be clever.
And now, back to coding …
Monday, June 01, 2009
And here I thought dating was easy …
I'm building on the work of indexing my filesystem by indexing all of my email. I have a ton of it spread across various directories and when ever I have to search for something (such as the time I flamed an entire department at FAU on a public mailing list—ah, those were the days), it's a long drawn out ordeal to find it.
Initial stab at the problem is to just index a few email headers, like
From:, To: (and the related Cc:),
Date: and Subject:—the primary headers one would
be interested in.
I decided to tackle one of the harder fields to process
first—From:. While the format is specified in RFC-822 and RFC-2822, there's still quite a bit of variance in the
format to be annoying.
I was able to squish 23 different formats into four cases:
- email address and real name aren't delimited, in which case, the only thing to parse is the email address;
- email isn't delimited, but the real name is (between parentheses, or quotes), so extract the real name from between the delimeters, and anything that isn't delimited is the email address;
- email is delimited (between angle brackets or square brackets), but the real name isn't, so extract the email address, and anything that isn't delimited is the real name;
- both the email address and real name are delimited, so it's trivial to extract both.
Then, I decided to parse the Date: header. Now, this
is specified, quite plainly:
5. DATE AND TIME SPECIFICATION
5.1. SYNTAX
date-time = [ day "," ] date time ; dd mm yy
; hh:mm:ss zzz
day = "Mon" / "Tue" / "Wed" / "Thu"
/ "Fri" / "Sat" / "Sun"
date = 1*2DIGIT month 2DIGIT ; day month year
; e.g. 20 Jun 82
month = "Jan" / "Feb" / "Mar" / "Apr"
/ "May" / "Jun" / "Jul" / "Aug"
/ "Sep" / "Oct" / "Nov" / "Dec"
time = hour zone ; ANSI and Military
hour = 2DIGIT ":" 2DIGIT [":" 2DIGIT]
; 00:00:00 - 23:59:59
zone = "UT" / "GMT" ; Universal Time
; North American : UT
/ "EST" / "EDT" ; Eastern: - 5/ - 4
/ "CST" / "CDT" ; Central: - 6/ - 5
/ "MST" / "MDT" ; Mountain: - 7/ - 6
/ "PST" / "PDT" ; Pacific: - 8/ - 7
/ 1ALPHA ; Military: Z = UT;
; A:-1; (J not used)
; M:-12; N:+1; Y:+12
/ ( ("+" / "-") 4DIGIT ) ; Local differential
; hours+min. (HHMM)
Okay, clear if you're into such things. And from the most recent specification:
date-time = [ day-of-week "," ] date FWS time [CFWS]
day-of-week = ([FWS] day-name) / obs-day-of-week
day-name = "Mon" / "Tue" / "Wed" / "Thu" /
"Fri" / "Sat" / "Sun"
date = day month year
year = 4*DIGIT / obs-year
month = (FWS month-name FWS) / obs-month
month-name = "Jan" / "Feb" / "Mar" / "Apr" /
"May" / "Jun" / "Jul" / "Aug" /
"Sep" / "Oct" / "Nov" / "Dec"
day = ([FWS] 1*2DIGIT) / obs-day
time = time-of-day FWS zone
time-of-day = hour ":" minute [ ":" second ]
hour = 2DIGIT / obs-hour
minute = 2DIGIT / obs-minute
second = 2DIGIT / obs-second
zone = (( "+" / "-" ) 4DIGIT) / obs-zone
Really, the only thing this does is mandate that the year be four digits long, moves to a numeric-only timezone format and clarifies a bit where white space appears, but otherwise, is pretty much the same as the older spec.
So, if I ignore the timezone for now (because the Standard C library has such piss-poor support for it, but that's a rant for another time), the only real issue is handling two or four digit years.
And in poking around the man pages for the various Standard C library
routines, I came across strptime(), which is the functional
opposite of strftime()—instead of converting the time to a
human representation, it'll take a human representation and convert it to a
time value. It isn't a Standard C call, but hey, why not use it for
now?
And it appears that the two-digit/four-digit year isn't a problem for
strptime():
When a century is not otherwise specified, values in the range [69,99] shall refer to years 1969 to 1999 inclusive, and values in the range [00,68] shall refer to years 2000 to 2068 inclusive; leading zeros shall be permitted but shall not be required.
man page for
strptime()
Sounds perfect!
Only it blew up when it encounted Wen, 2 Mar 2005 01:39:42
+0000.
Sigh.
Okay, make sure I start parsing past the optional day of the week. It
then blew up on Sat Mar 5 18:58:36 2005.
What the—? That's not even a standard format! And then there
was Wed,19 十二月 2001 20:23:05 (I added the question marks
because I can't determine the character set that was used for the
month—there's nothing in that particular email that even hints
what language it might be I
found out which language---Chinese. Figures).
And let's not forget 9/8/99 1:01:12 AM Pacific Daylight Time
(lovely) or Fri Jun 28 10:07:44 PDT 2002 or even
Wed 8-Jan-2003 08:24:20.
Oh, and we mustn't forget Tue, 23 May 100 22:18:56
-0400.
Double sigh.
I found it amazing—one of the more strictly defined fields in an email and yet there still was an amazing amount of garbage to be found (although to be fair, these anomalies account for less than one per cent of all the emails scanned, but when you have thousands of emails, it can still add up).
(And one more interesting note—I did not see one email use the military time zone format.)
Tuesday, May 26, 2009
Rolling, rolling, rolling
There's been some contention at The Sunday Game™ about the use of computers to generate random numbers. Usually, dice are used, but there is a small minority who prefers the use of computers over the use of physical dice when random numbers are called for.
But this, I think, is an excellent compromise.
I had a soft target of a machine capable of 200,000 rolls a day, as site traffic is growing. However, any automation project worth doing is worth over doing, and I way overshot the mark. The result is what you see here: a machine that can belch a continuous river of dice down a spiraling ramp, then elevate, photograph, process and upload almost a million and a half rolls to the server a day. I may not get nominated for a Nobel prize, but the deep rumbling vibration you feel more than hear when two rooms away is quite impressive.
It's a dice rolling machine! A computer controlled machine to roll dice—the computer uses a camera to read the results. Now this is a computer generated random number I can trust.
Saturday, May 23, 2009
Some bugs can lurk for years
I was shocked to learn that the binary search program that Bentley proved correct and subsequently tested in Chapter 5 of Programming Pearls contains a bug. Once I tell you what it is, you will understand why it escaped detection for two decades. Lest you think I'm picking on Bentley, let me tell you how I discovered the bug: The version of binary search that I wrote for the JDK contained the same bug. It was reported to Sun recently when it broke someone's program, after lying in wait for nine years or so.
Back in November I was working on a program where I needed a binary search that not only returned if I found something, but where it was, or would be if found. I had written such code for my greylist daemon so I lifted the code from there. As I was reusing the code, I realized that there indeed, could be a potential problem with it, in that a calculation of a certain value could overflow and cause unpredictable behavior.
But while I recognized the problem, I neglected to fix the problem in the greylist daemon. And I completely forgot about it until I came across the blog post “Official Google Research Blog: Extra, Extra—Read All About It: Nearly All Binary Searches and Mergesorts are Broken.
Oops.
Anyway, the patch was a one line change, from
mod = (low + high) / 2;
to
mid = low + ((high - low) / 2);
The old code certainly worked, but there could be a chance, if the
indicies low and high were significantly large
enough, to overflow and cause undefined behavior. Truth be told, both
high and low would have to be above 2,000,000,000
(on a typical system) before you might even get bit by this bug.
But still, the potential exists, and why not if it's an easy fix.
And all this is to announce the latest version of the greylist daemon (the “Beefier Bsearch” version if you will).
Friday, May 22, 2009
Selling out
- From
- XXXXXX XXXXXXXXX <XXXXXXXXXXXXXXXXXXXXXXXXXXX>
- To
- sean@conman.org
- Subject
- Interested in Purchasing Text Link
- Date
- Thu, 7 May 2009 08:36:32 +0500
Interested in Purchasing Text Link
I am interested in purchasing textlink advertising on several pages of your website
http://boston.conman.org/. Let me know if you are interested and we can discuss further details.I can make a good offer.
Best Regards,
XXXXXX XXXXXXX
I get emails like this from time to time and sometimes, I'll follow up on it, curious as to what the actual deal will be, and for this particular email, I was curious.
The deal basically was a one time payment for a permanent placement of a paragraph on seven particular entries. And the seven selected entries were a pretty ecclectic collection of posts for text advertisements for educational, exam, certification and internet related websites. But hey, it's their money I'll be receiving …
The only thing that did bother me was the “permanent placement” part, especially for the amount of money being offered. I replied with: “I'm interested, but the price given for perpetual ads seems too low. A year is fine though.” Hey, it can't hurt to haggle a bit.
- From
- XXXXXX XXXXXXX <XXXXXXXXXXXXXXXXXXXXXXXX>
- To
- Sean Conner <sean@conman.org>
- Subject
- Re: Interested in Purchasing Text Link
- Date
- Sat, 9 May 2009 14:56:00 +0500
Hi,
I can understand your concern for a higher fee but you have to consider the fact that i am advertising on the internal pages. Also i am advertising at the bottom of each page which doesn't have much real estate value.
Most advertisers look to advertise on the top of the page but at the bottom of each page the value has to be lower.
See basically there is no opportunity cost there, plus you can continue to advertise on the top of your page which is really your premier advertising spot.
Even then, after reconsidering my budget, I am willing to [increase the amount by 50%] for the 5 years deal.
I hope, now, it will also OK to you.
Okay, if the check clears, why not?
So I received the ads today, and placed them on the given entries and am currently awaiting approval before the check is sent. And I figure that if I hear nothing in two weeks, I can pull the ads.
But if all goes well, I'll have some additional fundage in the bank account. And don't be surprised if you come across something like:
— Paid Advertisement —
Well, not actually. This is just a sample. You too, can purchase this spot on any entry here in this blog. Just write and ask for details …
in the random entry or two …
Thursday, May 21, 2009
Enlightening Clouds II, Part II
I just had the pictures I took last night developed, and I must say, I'm impressed with the results, given it was a spur of the moment idea:
Honestly, I didn't even see the actual lightning bolts when taking this picture; I just remember seeing flashes of light from the night sky.
And next time, I'll try to move to where any crossing power/telephone lines aren't in the picture.
