The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Sunday, August 04, 2013

Programs from the past, Part II

I could have sworn I wrote a bit more about a program I wrote while at FAU; specifically, just how long it took to run back then vs now (or rather, a few years ago), but having failed to find any using Google or locally searching the entries on the server, I ended up spending all day yesterday reading through ten years of archives and still not finding anything, I guess I haven't.

Well. [Deep subject. —Editor] [Shut up you! —Sean]

So I wrote a series of programs to search through the domain of the following pair of equations:

xi+1 = (Ayi + B) xi (1 - xi)
yi+1 = (Cxi + D) yi (1 - yi)

The intent was to generate a large number of (x y) pairs and plot the results for a given set of values for A, B, C and D. Something like:

[Window showing a chaotic attractor]
[Control window for chaotic attractor]

The top window shows the resulting chaotic attractor when A is 2.4375, B is 1.5624, C is -0.8659 and D is 4.0 as you loop through the above two equations 15,000 times to generate 15,000 points (all between 0 and 1 for both the x and y axis). And the thing about a chaotic attractor is, the initial starting point (x0 and y0) is largely immaterial (but not quite, as the starting values must be between somewhere between 0 and 1). Any given value for x0 and y0 will result in the same figure being drawn for the given values of A, B, C and D.

The program shown allows you to change the settings of the four controlling variables A, B, C and D by sliding the named boxes horizontally (the botton one allows you to slide multiple variables at the same time—in this case, A and C will slide to the left if you slide the bottom box to the left).

That was fine, but my bosses at the time wanted another view generated—a “map” of the chaotic attractors, if you will. To generate this “map,” first you calculate how many points are generated before you settle into an attractor:

int mainloop(const double A,const double B,const double C,const double D)
{
  double xn,xn1;
  double yn,yn1;
  int    ix;
  int    iy;
  int    count;
  bool   pix[DIM][DIM];	/* have we seen this (xy) pair yet? */
  
  memset(pix,0,sizeof(pix));
  
  for (count = 0 , xn = yn = 0.5 ; count < MAXLOOP ; count++)
  {
    xn1 = ((A * yn) + B) * xn * (1.0 - xn);
    yn1 = ((C * xn) + D) * yn * (1.0 - yn);    
    xn  = xn1;
    yn  = yn1;
    
    /* exit if we're outside the bounds */
    
    if (xn <  0.0) return MAXLOOP;
    if (xn >= 1.0) return MAXLOOP;
    if (yn <  0.0) return MAXLOOP;
    if (yn >= 1.0) return MAXLOOP;
    
    ix  = (int)(xn * DIM);
    iy  = (int)(yn * DIM);
    
    if (pix[ix][iy])
      break;
    pix[ix][iy] = true;
  }
  
  return(count);
}

The assumption—if this returns MAXLOOP, there is no attractor for the given set of values for A, B, C and D; othersise it's an indication of how quickly (or how “showey”) we settle into the pattern.

Then it's a matter of going through a range of values and for the map images, since they're two dimensional, I only sweep through two of the four governing variables over their range:

for (A = -4.0 ; A <= 4.0 ; A += 0.016)
  for (B = -4.0 ; B <= 4.0 ; B += 0.016)
    plot(A,B,count_as_color(mainloop(A,B,-0.8659,4.0));

(count_as_color() just maps a count to a color)

And we end up with an image something like:

[Map of chaotic attractors]

In this image, A is the horizontal axis, going left to right from -4.0 to 4.0, and B is the veritcal axis, going, bottom to top, from -4.0 to 4.0; C and D are fixed, at -0.8659 and 4.0 respectively. The red plus in the upper right hand quadrant is the location of the “French horn” attractor above. The black area represents areas that have no chaotic attractor.

But my bosses didn't just want one map—no, they wanted a larger set. So, I generated a series of images as this, each image having a different value for C. And while the map may not change drastically—for instance, if we bump C up a notch to -0.8499:

[It's different; it's subtle, but it's different]

The resulting chaotic attractor is completely different:

[Window showing a slightly different chaotic attractor]
[Control window for the slightly different chaotic attractor]

And I generated a ton of maps by looping through values of C from -4.0 to 4.0—500 images in total.

Now, when I did this the first time, twenty years (or a bit more—maybe up to twenty-two years ago) I was doing this on a state of the art Unix workstation, an SGI workstation (perhaps you've seen their commerical?). It cost $30,000 and it took a year to generate the 500 images. Each individual image took some ten hours to generate.

Fast forward to today. On a laptop that is maybe two years old now. I reran the code and was able to generate an entire image (in fact, both of the map images above) in only 8 seconds.

With four cores, that means, on a two year old laptop that might have cost $1,500 (tops) would take 17 minutes what it took me a year to generate twenty years ago.

Oh, and about those eight seconds

Update on August 5th, 2013

Oops, I made a slight mistake

Update on Tuesday, August 6th, 2013

Mistakes were made alright.


About those eight seconds

In my previous post, I mentioned a program that took a year to run twenty-some years ago on a then state-of-the-art workstation could now be done in 17 minutes on a two year old laptop.

What I didn't mention is that the program twenty-some years ago was written in C and the program I ran today was written in Lua.

Yes, computers are so fast these days that a scripting language can out-perform computers from two decades ago.

“Okay,” you say. “But I won't want to wait 17 minutes for my data.”

Okay, fine. I see two options, and let's try the first option and one that most people would do—drop down to C. And yes, that does give us an improvement, an impressive improvement—only 2.5 seconds per frame, and across four cores that means you'll have the results in a little over five minutes. Not that bad.

The other option, and hear me out—is to take our Lua code and run it via LuaJIT, a (pretty much) drop in replacement for Lua that compiles down to native code. Even if it's a bit slower than C, it should still be faster than Lua with no code changes.

So how does LuaJIT fare?

Personally, I was expecting the C version (which I actually wrote first) to be faster, if only buy a little bit, but …

So, here's the C version (which generates a single image):

[spc]saltmine:~/source/play>time ./a.out >/dev/null

real    0m2.483s
user    0m2.470s
sys     0m0.000s

And now the LuaJIT version:

[spc]saltmine:~/source/play>time luajit amap.lua >/dev/null

real    0m0.849s
user    0m0.840s
sys     0m0.000s
[Yeah, that's what I did when I saw these results]

And no, that's not a mistake (belive me, I checked and rechecked)—here:

[spc]saltmine:~/source/play>time lua amap.lua >/dev/null

real    0m8.091s
user    0m8.060s
sys     0m0.000s
[spc]saltmine:~/source/play>time luajit amap.lua >/dev/null

real    0m0.856s
user    0m0.850s
sys     0m0.000s

So … um … I can have that data to you in two minutes? Is that fast enough?

On reflection, it makes sense that LuaJIT will outperform C in this case. It's heavily CPU bound and the fact that the main function:

function mainloop(A,B,C,D)
  local pix = {}
  local xn  = 0.5
  local yn  = 0.5
  
  for count = 1 , MAX do
    local xn1 = ((A * yn) + B) * xn * (1.0 - xn)
    local yn1 = ((C * xn) + D) * yn * (1.0 - yn)
    
    xn = xn1
    yn = yn1
    
    if xn <  0 then return MAX-1 end
    if xn >= 1 then return MAX-1 end
    if yn <  0 then return MAX-1 end
    if yn >= 1 then return MAX-1 end
    
    local ix = math.floor(xn * DIM)
    local iy = math.floor(yn * DIM)
    local f  = iy * DIM + ix -- Lua doesn't really do N-dimensional arrays
    
    if pix[f] then return count-1 end
    pix[f] = true
  end
end

can be recompiled per call to take advantage of the paramters. For instance, when A and B are both 0, the first expression then becomes:

local xn1 = xn * (1.0 - xn)

and given that I'm doing 32,768 interations of this (oh, did I fail to mention that? Yes, I'm doing 27,768 more interations than the code did twenty-some years ago) this does save quite a bit of time.

Update on August 5th, 2013

Oops, I made a slight mistake

Update on Tuesday, August 6th, 2013

Mistakes were made alright.

Monday, August 05, 2013

About those eight seconds, part II: mea culpa

The images presented in the previous two posts have been mostly reconstructions. The only piece of software I have from that project twenty-some years ago is the piece that drew the chaotic attractors, and that was specific to the SGI and had to be rewritten (extensively) for a modern system.

The other program—the one that generated the maps? I have no clue as to where that code went. Or rather, why I neglected to keep a copy of it. So I had to basically recreate that program from scratch. The original reconstruction started out as:

int mainloop(const double A,const double B,const double C,const double D)
{
  double xn,xn1;
  double yn,yn1;
  int    ix;
  int    iy;
  int    count;
  bool   pix[DIM][DIM];
  
  memset(pix,0,sizeof(pix));
  
  for (count = 0 , xn = yn = 0.5 ; count < MAXLOOP ; count++)
  {
    xn1 = ((A * yn) + B) * xn * (1.0 - xn);
    yn1 = ((C * xn) + D) * yn * (1.0 - yn);    
    xn  = xn1;
    yn  = yn1;

But since I was setting memory and not plotting points, I needed to be very careful about the values of xn and yn. I remembered that the values needed to be between 0 and 1, so attempt one I had:

assert(xn >= 0.0);
assert(xn <  1.0);
assert(yn >= 0.0);
assert(yn <  1.0);

But those assumptions were proven false rather quickly. Attempt two I decided to print the values of xn and yn and as it happened, very quickly (like on the first pass) the values hit +inf. I interpreted that to mean I could bail early, and thus finished the reconstruction with attempt three:

    if (xn <  0.0) return MAXLOOP;
    if (xn >= 1.0) return MAXLOOP;
    if (yn <  0.0) return MAXLOOP;
    if (yn >= 1.0) return MAXLOOP;
    
    ix  = (int)(xn * DIM);
    iy  = (int)(yn * DIM);
    
    if (pix[ix][iy])
      break;
    pix[ix][iy] = true;
  }
  
  return(count);
}

But the sharp edges in the maps were bothering me. So much so that I rewrote the code to continue the loop if the values of xn and yn were out of range, thinking that maybe they might come back into range.

And yes, they did:

[As odd as it sounds, that looks much better]

The downside—this revised code now takes a bit longer to run. The LuaJIT version is still faster than the C version, but the difference (on average 1 minute for the LuaJIT version; 1 minute, 30 seconds for the C version) isn't as jaw-droppingly great.

Sigh—it's not quite as fast as I made it out to be.

Update on Tuesday, August 6th, 2013

Yes, it is as fast as I think it was

Tuesday, August 06, 2013

Programs from the past, Part III: It's a numbers game

I'm still not done yet. When last we left off, the recreation of a project written twenty-some years ago wasn't as fast as I thought. But as I shall reveal, there's a bit more lurking here than I realized.

For a fair comparison, I actually have two versions of a program that generates those map files:

[Yeah, this again]

One version in C and one in Lua (previously, the recreated programs just spit out numbers that were then converted to an image with a second program). And here's how long it took to generate the results:

Timings to generate a map file
VersionTimings
Original program ~36,000s
Lua 1,096s
C 79s
LuaJIT 69s

(remember, the “Original program” was written twenty-some years ago and ran on a 32-bit 33MHz machine; the programs I just wrote are running on a 64-bit 2.8GHz machine)

My initial mistake in recreating this program was to toss out values that exceeded a range and from that, I got fantastic timings, but I was throwing out too much, as can be seen here:

[Trimming things a bit too close]

The “too-trimmed” version came about because I looked at some of the intermediate results and saw infinities (well, the IEEE 754's concept of infinity) and at that point, any operations done on “infinity” is “infinity.”

But my mistake was thinking that all values that fell out of range would end up being “infinite.” Not all did, and some even fell back into range.

But I was still troubled by the infinite results. So, why not explicitely check for “infinity?” In Lua/LuaJIT:

if xn == math.huge or xn == -math.huge then return MAX end
if yn == math.huge or yn == -math.huge then return MAX end

And in C:

if (xn ==  HUGE_VAL) return MAX;
if (xn == -HUGE_VAL) return MAX;
if (yn ==  HUGE_VAL) return MAX;
if (yn == -HUGE_VAL) return MAx;

(for those curious, HUGE_VAL is defined in math.h.)

Adding those lines makes all the difference:

Updated timings to generate a map file
Version Timings
Original program N/A
Lua 16.25s
C 2.60s
LuaJIT 1.25s

It's nice to see LuaJIT still beating the snot out of C, and yes, I was able to generate a full set of maps, just like I did twenty-some years ago, in under three minutes (remember, I'm running the code across four CPUs).

But now I'm upset, because checking for “infinity” was something I didn't do twenty-some years ago, and now I'm thinking, what if I had? Could that simple check, had I known about it, cut the run time of a year down to a month?

I can't blame the university for not offering a class on floating point arithmetic, because it did! And worse, I took the class! (when I entered FAU as a freshman, I knew that the first class for the Computer Science degree involved Fortran. I found the first class that had “Fortran” as part of the title and signed up for it, only to spend find the lectures spending more time on the horrors of floating point arithmetic and very little time on programming. It turned out I took the wrong class; what I signed up for was a Fortran class taught out of the Math Department (the very one I worked for when I wrote these programs twenty-some years ago) for mathematicians. When I discovered the mistake, I was able to get out of the class without issue, but only becuase I was actually doing quite well. In my defense, I was a freshman and didn't have my act together; but FAU didn't exactly have a Computer Science Department at the time, so it didn't have its act together either!). It never dawned on me to even check the intermediate results and bail early.

Sigh.

Thursday, August 08, 2013

World War II according to Hollywood

A wonderful explanation of World War II as done by Hollywood. Indiana Jones keeping the Ark of the Covenant away from the Nazis, or the Japanese forcing Obi-Wan to build a bridge—it's all there.


Programs from the past, Part IV: There Ain't No Such Thing As the Fastest Code

The point I want to make, though, is that the biggest optimization barrier that Terje faced was that he thought he had the fastest code possible. Once he opened up the possibility that there were faster approaches, and looked beyond the specific approach that he had so carefully optimized, he was able to come up with code that was a lot faster. Consider the incongruity of Terje's willingness to consider a 5 percent speedup significant in the light of his later near-doubling of performance.

Michael Abrash, Zen of Code Optimization

I'm probably belaboring the inanimate equus pleonastically, but this isn't the first time I've done so, so why stop now?

As was pointed out to me, I missed an obvious optimization in the map-generation program. That small change drastically changed the runtime of the C version of the program:

Latest timings to generate a map file
VersionNew TimePrevious Time
Lua 15.50s16.25s
LuaJIT 0.65s 1.25s
C 0.22s 2.60s

Some other testing revealed that I can now generate in 37 seconds (remember, four cores running a stupidly parallel problem) that twenty-some years ago took a year to do—generate 500 map files.

And speaking of 37 seconds … thanks to the Classic Computers Talk Mailing List, I was able to run this latest code on a nearly identical SGI machine that I used twenty-some years ago—a 33MHz R3000 running IRIX 4.0.5.

Back then, the generation of one map took around ten hours, but I had missed a few optimizations that could have help speed up the program. And yes, it's true—the optimizations I've since added, plus using a high optimization setting on the C compiler, resulted in a map file being generated in 37 seconds.

On a twenty-some year old 33MHz machine.

Head. Desk.

The entire program run, which took a year, could have been done in less than 5¼ hours!

I think it's time to stop belaboring this inanimate equus.

Thursday, August 15, 2013

Nobody expects the Sleep Paralysis Inquisition!

I had fallen asleep at the hotel (oh yes, Bunny and I had to abandon Chez Boca for the night because of plumbing issues—no damage, but no usage of either bathroom) when it felt like Bunny slid into bed behind me. I cracked open an eye, which was tough, as I was very tired, and yet, there she was, in the other bed!

But I swore I could have felt someone—

Nay! What Devilry is this? That was when I felt the bed moving unnaturally behind me, and lo, above my head, jutting out just far enough for me to see, was the metal leg of the bed looming. Further more, said Devilry had paralyzed me.

I attempted to scream. “BUNNY! BUNNY!” But all that was coming out was “Bunny. Bunny.”. But she was unmoving, still consumed by her book. From what I could see, all she was hearing was “Zzzzzzzzzzz.”

I just knew the bed was gyring and gimbling behind me, waiting, plotting, to suck me into the Abyss. I needed to flee. Struggling against paralysis, I was able to stand up, but the bed covers were smothering me, entangling me as I attempted to move. I fell flat, face first onto the floor, which surprisingly, did not hurt. But it did knock the breath out of me and I passed out.

When I came to, I was still in the bed. Bunny was still reading her book, having known not of my plight with the Hellmouth forming behind me, nor of my failed attempt to escape it.

It was then I realized the Devilry for what it was—sleep paralysis. I may have to rethink what triggers my sleep paralysis attacks, since I was not napping when this happened.

Update on Saturday, August 17th, 2013

In case you are curious as to the plumbing issue at Chez Boca, the main drainage pipe under the house was blocked up. The services of a plumber were required to clear the clog, and even with the expense of a hotel room for the night, it was still cheaper to have the plumber do the work the next day than to pay the “Emergency Rate” Wednesday night.

As Bunny says, “the joys of being a land baron!”


Apparently, I have products!

From
"Dr. Glidden Kennedy" <cucdtk.tp@hoabinh.gov.vn>
To
(Obviously not me) cucdtk.tp@hoabinh.gov.vn
Subject
Having gone through your listed products
Date
Thu, 15 Aug 2013 20:48:31 +0700 (ICT)

Good Day ,

Having gone through your listed products, we offer great interest to do a purchase agreement with your company. Please get back to us with the following information Please Quote.

1. Prices FOB
2. Payment terms
3. Delivery Period
[There is no 4. Spoons! –Sean]
5. Specified delivery date assuming from the Date of Order.

Please your quick reply will be highly appreciated to our private Email (ptamex412@yahoo.com)

Regards,
Mr. Woody Allen
Purchasing Operations Manager.
PT AMEX Corporation

I have listed products to offer?

Really? [Yes, your Amazon affiliate links. –Editor]

Oh, yeah.

But that aside, what's the angle here? I mean, random spamming of companies looking to buy unspecified items? Um … perhaps inserting themselves as middle men, advertising your products as theirs, passing on the orders they get to you to fulfill, and charging their “customers” more money?

Perhaps?

Either that, or Mr. Allen is really desperate for money.

And assuming he is, Mr. Allen, here are my answers:

  1. FOB EXW Boca Raton (Incoterms 2010).
  2. Cash up front, no credit. Payable as US dollars or an equivalent amount in gold or silver.
  3. Please allow four to six weeks for delivery.
  4. We don't talk about Flight Club (think happy thoughts—I'm expecting only four people to get this).
  5. Um, what part of “Please allow four to six weeks for delivery” did you not understand?

That should work.

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.