The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, February 13, 2008

Another one of those leaky abstractions

It was something that should have been easy.

Earlier this week, some spammer found a PHP script on one of our servers that allowed him unrestricted access to send spam. Two times our server had maxed out at 100Mbps sustained output, and it was after this second attempt that I learned that the problem could be easily solved by adding the mail() function to the disable_functions directive in the php.ini file. This has the nice benefit of not allowing any PHP script to send mail. Unfortunately, our customers don't see this as a nice benefit, so it's not a long-term solution.

So we need to allow such PHP scripts to run. But the problem we (okay, I) were (was) having was locating the PHP script (or scripts) being abused. When you have scores of sites on the server, isolating the one or two problem scripts is not a trivial problem.

But P found another directive in the php.init file—sendmail_path. So a simple program (ha!) could be written to log some critical information and pass execution along to sendmail, and thus we could finally locate the problematic PHP scripts.

After thinking about the problem for a bit, I came up with the basics of the script (in pseudocode):

main()
{
  string input = STDIN;

  extract To:, Cc: Bcc: headers from input;
  extract HOSTNAME environment variable;
  extract PWD environment variable;

  log To, Cc, Bcc, hostname, pwd 

  in,out = pipe(); /* create a unidirectional data pipe */
  fork();	   /* creates a new process */

  if (parent-process)
  {
    write(out,input);
    waitfor(child);
    exit;
  }

  if (child-process)
  {
    set STDIN to in;
    exec(sendmail);
  }
}

When I tested the program on my workstation, it worked.

So I installed the program on the server in question.

It didn't work.

Oh, it worked when I tested a sample PHP script from the command line, but it failed when executed from the webserver.

Now, the major differences between my workstation and the server are:

  1. My workstation is a virtual server. The server is not.
  2. My workstation runs Postfix. The server runs Sendmail.
  3. My workstation does not have a control panel. The server does.

Any one of those could be the culprit.

Okay, so let's make a simpler program. Over the course of an hour, I ended up with:

main()
{
  exec(sendmail);
}

And that still wasn't working through the webserver when P asked a rather stupid question: “Is it a permissions problem?”

The answer was even stupider—yes—it was a permission problem. The location I had selected for the program wasn't accessible from the webserver.

Fix that problem, and now the program just hangs (but does log what I asked it to log).

Well, rather, sendmail was hanging.

And then major surgery on my program started.

Okay, maybe sendmail is attempting to write something and hanging there, so read anything sent back from sendmail—still hanging.

Okay, maybe sendmail is still expecting more input. I close my side of the pipe after writing—still hanging.

Okay, it looks like my program is hanging trying to read anything being sent by sendmail, so register a signal handler to catch SIGCHLD (a signal sent when a child process exits) so I can break out of the read() call and clean up—nope.

Maybe it's the code that's reading stdin—maybe I'm not handling that correctly—nope.

Run gdb on the spawned sendmail program (I was getting really desperate at this point). Hmm … it's stuck in the read() system call.

That shouldn't be happening. I'm closing my side of the data it's receiving. Unless it's not noticing that the pipe—

AH HAH!

Let me check something—PHP is envoking sendmail with the -i option:

-i
Ignore dots alone on lines by themselves in incoming messages. This should be set if you are reading data from a file.

sendmail manpage

Hmmm …

Pipes under Unix are not the same as files. Sure, they can be treated as files for the most part, but there are some instances where the abstraction breaks down, and I was hitting such a breaking point.

When reading a file (as in, a real file off a disk), the read() system call returns the number of bytes read, but at the end of the file, it just returns a 0 to indication no more data. But a pipe doesn't quite work the same way. Once a pipe empties, the next call to read() will cause the calling process to wait until there's more data in the pipe, since a pipe has two ends—a reading end and a writing end.

And for some reason, the fact that my wrapper program was closing its end of the pipe wasn't enough to signal to sendmail that there was more data. When my wrapper program closed its side of the pipe, the operating system should have sent the signal SIGPIPE to sendmail, but if sendmail explictily ignores SIGPIPE then it never gets the signal that there's no more input.

Regardless of what sendmail was doing, it was expecting more input from a pipe that was closed.

A change to the program:

main()
{
  copy STDIN to tempfile;

  extract To:, Cc:, Bcc: headers from tempfile;
  extract HOSTNAME environment variables;
  extract PWD environment variables;

  log To, Cc, Bcc, hostname, pwd

  fork();
  if (parent-process)
  {
    waitfor(child);
    exit;
  }

  if (child-process)
  {
    set STDIN to tempfile;
    exec(sendmail);
  }
}

and it worked as expected.

Sigh.

Anyway, if anyone else needs such a program, I've released the code.

Update on Monday, April 18th, 2022

I've since taken the code down.

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.