Build template

I don’t like make.

The syntax is clunky, the semantics are unintuitive, it’s hard to get parallel builds to work well, and there are a million ways to shoot yourself in the foot.

Nevertheless, I stick with it, because it’s ubiquitous and it’s the simplest solution for what I do.

A simple Makefile might look like this:

all: main
OBJS = \
  Foo.o \
  Bar.o \
main: $(OBJS)
  $(CXX) $(LDFLAGS) $^ -o $@

This says main depends on Foo.o, Bar.o, and Baz.o; each of these is built using one of the builtin rules.

I often want some extra build flags for debugging or optimization, so I add:

CFLAGS += -ggdb
CXXFLAGS += -fno-inline

If I’m using GCC and I want automatic dependency generation, I add this:

DEP_FILES = $(patsubst %.o,%.d,$(OBJS))
-include $(DEP_FILES)

Now whenever I build Foo.o from Foo.cpp (or Foo.c), g++ will generate Foo.d at the same time. The next time I type make, it will read Foo.d to determine what Foo.o depends on.

And just in case something gets borked, I usually want a clean rule:

.PHONY: clean

Now none of this is intuitive (I learned it all from trial-and-error), and there’s a whole lot more that build systems need to do (building shared libraries, finding where libraries are located on a system, etc.), but this really does solve 90% of the problems I hit in small-to-medium projects. I’ve even used a setup like this in a medium-large project (> 200K lines of code) with success.

Please, tell me why I should switch? I despise make just as much as the next guy, but it’s simple, and it works.

Stupid vim trick

I decided yesterday to play with autocmd:

function EditFileLine(file_line)
  let file = substitute(a:file_line, '\(.*\):\(.*\)', '\1', '')
  let line = substitute(a:file_line, '\(.*\):\(.*\)', '\2', '')
  exe "ed" file
  normal gg
  exe +line
  hi CursorLine   cterm=NONE ctermbg=darkcyan ctermfg=black guibg=darkcyan guifg=black
  setlocal cursorline
  " au CursorMoved * setlocal nocursorline
au BufNewFile *:* call EditFileLine(expand("<afile>:p"))

I had gotten tired of highlighting the file:line output in a gcc error message, only to paste it and have backspace to remove the line number. This code will parse the file:line string, open the file, jump to the line, and lastly highlight the line that was jumped to.

Actually, it highlights the line under the cursor. I wanted to turn the highlight off whenever the cursor is moved, but I’ll settle for this. (My feeble attempts at using autocmd CursorMoved to disable the highlight caused the highlight to disappear altogether; I’m guessing that vim moves the cursor again at some point after the BufNewFile autocmd is complete).

Sample usage:

$ vim File.cpp +58 # before I had to type this
$ vim File.cpp:58  # now I can type this

It’s odd how few examples I could find on the web that 1) use the substitute() function with match groups, 2) reference function arguments (need to use the a: prefix), or 3) demonstrate how to use the execute and normal commands (the former is like typing a : in normal mode and then a command, while the latter is like playing back a macro from a function to simulate keypresses).

Some people might suggest that I learn how to use :make and :cope, and others might suggest that I use emacs. I stopped using an IDE over ten years ago when I stopped using Turbo C and RHIDE, so I’ve been using the command line for too long to be comfortable with either of those “solutions”.

Color wiring diagrams

Whenever I look at a car wiring diagram, I tend to get cross-eyed. The entire thing is usually black-and-white with labels to indicate wire color. When I see a loose wire in the engine bay, it often takes as much as twenty minutes to figure out where in the diagram that wire is located.

But what if the diagrams were in color? Would that not make it easier to locate the wire?

My first stab at this was to sit down with some colored pencils and start coloring away. This actually turns out to be quite relaxing (as one friend pointed out, it’s like kindergarten for adults). It’s not a good way to share the diagrams with others, though, and it’s time-consuming to recolor them if they get lost or damaged. (in my case, I left my binder of 1989 plymouth voyager turbo wiring diagrams on the top my my minivan while driving down a rural highway at 80mph).

This is why we invented computers. Some people may tell you it had to do with codebreaking or physical simulations or the internet, but the real reason they were invented was so we could have our wiring diagrams in color.

Step 1: Download wiring diagrams from here:

Step 2: Load the pdf into acrobat and print it to a ps file

Step 3: Convert the entire ps file to a series of pgm (greyscale) images. The greyscale takes a little more processing time, but the text labels turn out a bit nicer this way.

gs -sDEVICE=pgm -r300x300 -dNOPAUSE -sOutputFile=page%03d.pgm

this will convert every page of the ps file to a file called pageXXX.pgm at a resolution of 300dpi. It should be possible to instead use a bitmap (black and white) at a higher resolution, but I got better results with less processing time using greyscale.

I also tried using greyscale at a higher resolution (600dpi), but this yielded worse quality. Presumably this is because the resolution of the image exceeded the resolution of my input file.

Step 4: I considered writing a script for this step, but it turns out someone’s already done all the hard work of vectorizing the pgm input file. Enter potrace. It takes a bitmap as input and produces a vectorized output in xfig, svg, or a variety of other formats. Beautiful!

I’ll start with one page for now:

potrace -b xfig -o page001.xfig page001.pgm

Coloring it is as easy as loading it up into skencil:

Screenshot-page001.xfig - Skencil

First I select the objects and then ungroup (in the Arrange menu). To color an object, click on it, and select the color. The image above highlights one small problem: potrace decided that the wires that connect in a single dot are a single object. That makes colorizing them a bit trickier.

Two options from here: 1. find a better drawing tool that will let me manually split the polygon, or 2. figure out what options to pass to potrace or autotrace to generate separate objects in the first place.

Tags: ,

A new project

A difficult and unfortunate truth for any hobbyist is that there is only so much time in a single human lifespan. I have wanted to continue to play with JIT compilation, not only because it is useful and an interesting mental challenge, but because it’s a great way to disconnect from the world. Sadly, time does not permit this to happen.

I hope to one day again come back to the hobbyist programming world. It was a difficult decision this year to not attend RubyConf for the first time ever (until this year, I was one of four remaining nine-time attendees). But all good things must come to an end, and ends make way for new beginnings. In my case, new beginning include racing in the 24 Hours of Lemons, building a minivan drag car, and the possibility of entering medical school.

As for the minivan, I would like to make it ruby-powered. K-cars (including the caravan/voyager) are really simple machines, and for a few hundred dollars, every sensor in the vehicle can be datalogged. This is a must for building reliable power: if the engine is not running efficiently, then it’s not producing as much power as it potentially could; worse yet, at high boost it could damage the motor. (yes, you read that right — this minivan has a turbo).

Datalogging a drag-run should be straightforward. I’ve never done anything like this, so what sounds simple when I read about it may turn out to be a PITA. But once I get the datalogger hooked up, how much fun would it be to write a ruby library to intercept the data in real-time? And how much more fun would it be on top of that to visualize this data in real-time using an opengl frontend? Lastly, how slick would it be to take this opengl display and project it onto the windshield ala a heads-up-display?

I’ve wanted to get back into graphics programming for quite a while; now I have an excuse. All I need now is some spare time.

Tags: , , , , , ,

Converting stack-to-register

Like many other virtual machines, the YARV machine is stack-based.  Because most modern CPUs are register-based, to compile YARV bytecode to efficient native code means the stack operations need to be converted to register operations.

Ludicrous does this by walking the instruction sequence and pushing and popping value objects onto and off of a stack at compile-time.  This completely eliminates the YARV stack (the machine stack is still used by libjit as necessary), but has some limitations: all control paths must produce an identical stack.  A conditional that produces a different size stack in one path than in another will break this scheme.

Ludicrous does its best to detect this situation if it ever comes up and will refuse to compile bytecode that tries to do evil things like this.  However, I recently discovered a case that wasn’t covered:

irb(main):001:0> require 'internal/method'
=> true
irb(main):002:0> def foo; return true ? 1 : 2; end
=> nil
irb(main):003:0> puts method(:foo).body.body.disasm
== disasm: <RubyVM::InstructionSequence:foo@(irb)>======================
0000 trace            8                                               (   8)
0002 putobject        1
0004 jump             8
0006 putobject        2
0008 trace            16
0010 leave
=> nil

Here YARV has made an optimization — true is always true, so a conditional branch is unnecessary. However YARV wasn’t smart enough to remove the dead code that pushes ‘2’ onto the stack. When Ludicrous compiles this, it gets confused because at the leave instruction the stack contains [ 1, 2 ] (remember that Ludicrous evaluates the stack at compile time).

I suspect this is a bug in the YARV optimizer, but it illustrates the case — converting code written for a stack machine to run on a register machine is no trivial task. I’m not sure yet how to handle this case. All control paths (there is only one) do produce the same stack, but Ludicrous does not have a mechanism to detect that the second pushobject instruction is dead code and does not modify the stack.

Perhaps the best solution is to raise an exception and fall back on the interpreter for functions that do this.

Tags: , , , , , ,

Why libjit?

One question I’ve been asked quite a bit over the last two years is why I decided to use libjit for Ludcirous. Evan Phoenix had considered it for Rubinius a long time ago but rejected it partly because of the license. When I started Ludicrous, I decided to use libjit in spite of the GPL license (the license has since been changed to LGPL).

First of all, why is the GPL (v2) license problematic for a JIT compiler? Any JIT compiler using libjit must link with it, and any program run under this JIT compiler therefore ends up dynamically linking with libjit in the end. This puts the user in a precarious grey area in terms of the GPL (particularly since at the time the code had to be modified to run under Ludicrous). As I said, libjit is now under the LGPL (GPL plus an exception would have been sufficient as well), so this is no longer an issue.

Yet I picked libjit in spite of this because I felt it was the best tool for the job at the time. I evaluated a number of other alternatives:

  • GNU Lightning – much of the library is implemented with macros, making it difficult to use programmatically. It would be virtually impossible to wrap this library as a ruby module without some heavy hacking. It also requires the user to do his own register allocation, which isn’t something I was interested in figuring out.

  • DynASM – this is the “dynamic assembler” used by LuaJIT. It’s supposed to be very fast and lightweight. It’s basically a lua script that preprocesses your C source code, looking for platform-independant asm and replacing it with C calls to generate that asm at runtime. I didn’t look too closely at this because of its dependency on lua.

  • LLVM – somewhat heavyweight, much more than a JIT, but interesting because of its power. However, at the time I could not make heads or tails of their documentation, much less build a simple prototype that adds two numbers. I get the impression this has been addressed now (judging from the number of compilers using LLVM these days), but I’ve never looked back at LLVM.

  • NekoVM – had a working JIT for about two years before I started Ludicrous. Looks lightweight. I wasn’t interested, though, in compiling to a different VM; I wanted Ludicrous-compiled code to run under MRI (giving me full access to all the C extensions that are available).

  • Nanojit – I believe this was available at the time, but was too new to interest me.

  • Libjit – small, lightweight, but most importantly, well documented. It’s a register-based VM that emits code as functions are called, making it easy to wrap as a ruby extension. It does physical register allocation for me; I just pretend I have an infinite number of registers and let libjit do the work. I read through the tutorials in less than half an hour and had a prototype of ruby-libjit running the same afternoon. After that, as they say, it’s all history.

A faster stack

I’m back in the country now and starting to work on Ludicrous again. A number of YARV instructions have now been implemented and very simple methods can be executed.

Stack manipulation is tricky. As mentioned before, the first pass of JIT-compiling YARV bytecode actually allocated a Ruby array for every method invocation, calling rb_ary_push() and rb_ary_pop() to manipulate the stack. As you can imagine, this is quite slow.

The next pass tried to actually use the real YARV stack. This meant directly accessing Ruby’s thread structure to get a pointer to the stack. It was faster than using an array, but there were too many edge cases where the stack pointer wasn’t being manipulated properly.

The current solution uses a static stack instead of a dynamic stack, pushing and popping the stack at compile time rather than at run time. This means that stack operations can actually be optimized into register operations. It obviously doesn’t work with all possible YARV bytecode, but I don’t know of any cases where Ruby generates bytecode that needs a run-time stack (e.g. pushing a value onto the stack in a loop). In any case there are some rudimentary checks for such code, e.g. that the stack size is the same at the beginning and end of a detected loop.

The next challenge to tackle is to implement rescue and ensure. I’ve already got PUSH_TAG, POP_TAG, and EXEC_TAG implemented in ruby-jit (impossible on 1.8, since the current tag is private in eval.c), and the rest should come relatively easily.

Ludicrous on YARV

Last night I was tinkering with Ludicrous and I decided to see how hard it would be to get it to run on YARV. After modifying the extension to compile against ruby 1.9, it turned out to be only a handful of lines of code to compile a simple method in YARV:

class Foo
  def foo
    return 42


puts #=> 42

The primary difference between methods on 1.8 and methods on YARV is that methods in 1.8 are represented by SCOPE nodes that hold a reference to that method’s AST, while YARV methods are represented by a METHOD node that holds a reference to that method’s instruction sequence. The code for compiling METHOD nodes and SCOPE nodes is remarkably similar, the primary difference lying in the code that compiles the instruction sequence:

class VM
  class Instruction
    class PUTOBJECT
      def ludicrous_compile(function, env)
        value = function.const(JIT::Type::OBJECT, self.operands[0])

    class LEAVE
      def ludicrous_compile(function, env)
        retval = env.pop

Currently the stack is a simple Ruby array, which is probably going to run slower than YARV bytecode, but it shouldn’t be difficult to use a more highly optimized stack (perhaps the system stack).

Ludicrous release is on the way

A number of months ago I began working on a just-in-time compiler for Ruby 1.8. What started as an experiment is actually now a reality – Ludicrous is now an executable that can run ruby scripts just like MRI!

Based on some ideas Charles Nutter mentioned on his blog regarding JRuby, Ludicrous works by installing stubs for every method in a given class that jit-compile the methods when they are called. If a method can’t be jit-compiled, the stub is removed and Ludicrous reverts back to the interpreter version of the method.

Ludicrous-compiled methods and MRI-interpreted methods can thus live safely together inside the same interpreter. This has allowed Ludicrous to grow incrementally, since it was able to run all of Test::Unit from day one.

Benchmarks show it to be on par or better than YARV:

cout@bean:~/ludicrous/sample$ ruby simple_benchmark.rb
                           user     system      total        real
Ackermann function     0.600000   0.020000   0.620000 (  0.645012)
Array access           3.590000   0.010000   3.600000 (  3.648332)
Fibonacci numbers      1.030000   0.020000   1.050000 (  1.089943)
GCD (iterative)        2.440000   0.110000   2.550000 (  2.593122)
GCD (recursive)        0.820000   0.040000   0.860000 (  0.889390)
Hash access I          1.400000   0.050000   1.450000 (  1.452047)
Hash access II         2.990000   0.050000   3.040000 (  3.152132)
Lists                  0.880000   0.000000   0.880000 (  0.912412)
Nested loop            4.700000   0.010000   4.710000 (  5.104556)
Sieve of Eratosthenes  5.660000   0.040000   5.700000 (  6.328717)
Word Frequency         0.690000   0.020000   0.710000 (  0.779878)
cout@bean:~/ludicrous/sample$ ruby1.9 simple_benchmark.rb
                           user     system      total        real
Ackermann function     0.640000   0.000000   0.640000 (  0.691521)
Array access           3.280000   0.000000   3.280000 (  3.394318)
Fibonacci numbers      0.290000   0.000000   0.290000 (  0.528554)
GCD (iterative)        0.620000   0.010000   0.630000 (  0.660089)
GCD (recursive)        0.210000   0.000000   0.210000 (  0.238961)
Hash access I          2.320000   0.040000   2.360000 (  2.416732)
Hash access II         4.180000   0.010000   4.190000 (  4.312941)
Lists                  0.560000   0.000000   0.560000 (  0.558828)
Nested loop            2.580000   0.000000   2.580000 (  2.660648)
Sieve of Eratosthenes  5.590000   0.050000   5.640000 (  5.751475)
Word Frequency         1.170000   0.000000   1.170000 (  1.200573)
cout@bean:~/ludicrous/sample$ ruby -I../lib -I../ext simple_benchmark.rb --jit --skip=sieve
Compiling nested_loop
Compiling ack
Compiling gcd_recur
Compiling hash_access_II
Compiling array_access
Compiling lists
Compiling word_frequency
Compiling fib
Compiling hash_access_I
Compiling gcd_iter
                        user     system      total        real
Ackermann function  0.200000   0.000000   0.200000 (  0.210097)
Array access        1.860000   0.030000   1.890000 (  1.919088)
Fibonacci numbers   0.210000   0.000000   0.210000 (  0.212081)
GCD (iterative)     0.060000   0.000000   0.060000 (  0.059266)
GCD (recursive)     0.140000   0.000000   0.140000 (  0.142387)
Hash access I       1.260000   0.000000   1.260000 (  1.321544)
Hash access II      1.700000   0.010000   1.710000 (  1.765920)
Lists               0.870000   0.000000   0.870000 (  0.868467)
Nested loop         2.900000   0.010000   2.910000 (  2.995473)
Word Frequency      0.650000   0.000000   0.650000 (  0.646185)

Ludicrous performs clearly better in most of the benchmarks here. Most impressive is the iterative GCD benchmark; Ludicrous executes the function 11 times faster than YARV and almost 44 times faster than 1.8! The tests where it doesn’t do as well are ones that make heavy use of blocks, because Ludicrous can’t do block inlining like YARV does.

To be fair, Ludicrous doesn’t implement a lot of features that would slow it down, such as event callbacks (used for profiling), bignums, and checking for redefinition of primitive arithmetic methods (such as mathn.rb does). It does try to avoid certain optimizations that YARV does not do, to keep the performance comparisons sensible.

Ludicrous’s performance is not at the expense of correctness, either; it passes all of the bfts tests and most of the tests that come with Ruby 1.8 (there are some pesky tests in particular that it is impossible for Ludicrous to pass without patching the interpreter).

Expect a gem to be released within the next few weeks as I iron out a few more kinks.