The Boston Diaries

Thursday, August 07, 2008

A primitive form of fine-grained revision control

Work continues on “Project: Leaflet” and when I last left off, I mentioned that git is nearly perfect for handling the fine-grained revision control.

I'm here to report—it is.

The ability to make changes to one version of “Project: Leaflet” (say, the MySQL version) and then selectively merge changes into the other version (in this case, the PostgreSQL version) isn't that bad with git.

I currently have three respositories for “Project: Leaflet”—the “master” repository with two branches, one for the MySQL version, and one for the PostgreSQL version; another one that's my working MySQL repository, and the third that's the working PostgreSQL version.

The workflow isn't that bad. I make changes on one of the work repositories, say, the MySQL version:

mysql-work> vi somefile.c # make changes, test, etc
mysql-work> git commit -a # have working version, commit changes

Then, when done there, I go to the master repository:

master> git checkout mysql
Switched to branch "mysql"
master> git pull server-path-to-mysql-work
 [ bunch of output ]
master> git log >/tmp/changes
master> git checkout postgresql
Switched to branch "postgresql"

I then view the changes made, and pick which commits I want to merge:

master> git cherry-pick f290b3e50e4cea1c3ee5e5265faa996943ef8542
 # that large value is the ID of the commit
 # I pick the ones that apply 
 [ bunch of output ]
master> git cherry-pick 574756ffaa10cdc8452b33bf3d0ab8b786395080
 [ bunch of output ]

Then go to the other work repository, and pull the now-merged changes:

postgresql-work> git pull server-path-to-master
 [ bunch of output ]
postgresql-work> vi somefile.c # make any non-portable changes,
postgresql-work> git commit -a # tests, etc,

And then back to the master to pull back the PostgreSQL changes and any non-specific merges that may have come up. I could probably make it smoother, as git is also a revision control toolkit, but as of yet, it's not yet annoying enough to warrant the work.

Still obsessing over stupid benchmarks …

The problem. The PHP implementation is a lot slower. Embarrassingly slower. Without any caching the Java version is able to do ~6000 queries per second. The PHP counterpart can push through ~850 queries. The implementations are the same. The stats provided by the author of the library are 8000 vs 1200. So about the same as my measurements.

Via reddit.com, Case study: Is PHP embarrasingly slower than Java?

In my ever continuing obsession with stupid benchmarks and optimization, I decided to tackle this particular little problem like I did with Jumble—map everything into memory and avoid disk I/O altogether (well, explicit disk I/O—the system will page in the data implicitly as it's used). This time, the data maps down to an object file about 8½ megabytes in size (all constant data, so pages can be discarded, not paged out), and with that, I was able to get ~100,000 queries per second.

On a 120MHz machine!

It didn't even take all that long to write …

Thursday, August 07, 2008

A primitive form of fine-grained revision control

Still obsessing over stupid benchmarks …

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer