When I first announced tinyrb there was no Float, no Module, no Proc or Bloc, no Array, Hash, IO, Range, Metaclass. Frankly it was not really Ruby. But now, most of those are in, except Float and IO, and Proc is halfway there (see proc branch). Lets just say that it’s running a lot more code.

New Grammar

The first big problem I hit was the parser, initially written in Ragel and Lemon it was not flexible enough. I since rewrote it using peg/leg. I’m having some trouble with {...} because of the ambiguity between Hash and blocks. But other then that, it works like magic. The downside of using a PEG parser is poor error reporting and less secure as it is prone to infinite recursion. The later might become a problem, but maybe it’s just a bug in peg/leg.

Upvals, Cheap Proc

One of the problems with most Ruby implementation is that using a Proc might end up costing more memory that you think. The reason is that a part of the stack frame (or another structure where local variables are stored) must be saved to allow you to access local variables from the outer scope later inside your Proc.

def add(x)
  proc { |y| x + y }
end
add2 = add(3)
add2.call(4) # => 7

You see, the context of add must be saved because we access the local variable x that is defined outside of the Proc. If you create lots of Procs, that might become a problem and it’s not very “tiny”.

Once again, the solution can be found inside Lua. And it is named upval! An upval (or upvalue) is a local variable from an outer scope, like x from inside the Proc in the previous example. I’ll spare you the implementation details but it has 2 advantages.

First, accessing an upval only requires dereferencing a pointer. As compared to other VMs that save the full stack frame. They need to browse the frames array to find where the local variable is defined each time. (Let me know if I’m wrong, but I’ve checked Rubinius (push_local_depth instruction) and YARV (getdynamic instruction) source).

Second, when the enclosing scope of a Proc goes out of scope, for example when we leave the add method in the previous example, we just copy the value of the local inside the upval. value will point to closed in the TrUpval struct. No need to copy and save a bunch of frames, just update a pointer.

typedef struct {
  OBJ *value; /* points to a local or closed */
  OBJ closed; /* value when closed */
} TrUpval;

Please note this is not fully implemented yet, I’m working on the “closing” of upvalues at the moment.

Compact bytecode

Another thing I ported from Lua is the way it stores local variables on the stack. For the VM, it means local variables are no different then any other values you pass around (return values, arguments, etc). The advantage of this approach is that it makes the bytecode a lot smaller when using local variables. This is only possible in register based VM like Lua, tinyrb and Parrot.

For tinyrb’s VM, local variables are just named registers. So the code a = b = 1 is compiled to the following.

; block definition: 0xc2f00 (level 0)
; 1 registers ; 0 nested blocks
; 0 args 
.local  b        ; 0
.local  a        ; 1
.value  1        ; 0
[000] loadk        0   0   0 ; b = 1
[001] move         1   0   0 ; a = b
[002] return       0   0   0
; block end

In YARV, it requires twice more instructions.

0002 putobject        1
0004 dup              
0005 setlocal         b
0007 dup              
0008 setlocal         a
0010 leave

Well, that doesn’t mean tinyrb is faster, but it needs to run half the code as the other Ruby VMs to do the same thing with local variables. Also, the code takes half the space in memory too of course. So we’re on the right track.

FFI

Wayne Meissner, the creator of the FFI gem and FFI stuff inside JRuby, has been working on FFI integration. This is probably the most exciting things you can use tinyrb for at the moment. Wayne also started implementing some more corelibs in Ruby using FFI. I hope to use this to implement more of tinyrb in Ruby.

Garbage Collector

I’ve been using the Boehm GC as a way to move development faster. I think it’s more important to get some core features in first. And I wasn’t sure which type of GC would fit tinyrb best. But being “tiny” code wise and memory wise, refcount seems the best match. While refcount is often viewed as a poor solution, it’s the most efficient one in terms of memory consumption as far as I know. As soon as an object is no longer used, it is freed from memory. Oppositely, VMs using mark and sweep algorithms must allocate a large quantity of memory up front and generally can’t give it back to the OS.

Hopefully for me, some people mentioned on the IRC channel they’ll be working on a GC for tinyrb as part of their master thesis. I hope this works!

That Was A Lot of Text

At first, I wasn’t sure about the focus of the project. It was just a learning experience. But now I think there’s a need for a small and efficient Ruby VM that use very little memory. Expect to see more development in those directions.