I remember as a kid I was so excited when something broke in the house, a phone, the TV or the Nintendo. That meant I got to open the thing and look at how it worked. Of course I had to nag my parents for it and they sometimes questioned if I was the one who broke it in the first place simply to crack it open. Maybe once… but not more, I swear!

Later on I discovered programming and became fascinated by how easy it was to create things with it. It took me a few years, but I finally got the itch to crack it open and learn how programming languages are made.

Since then, I believe I became a better programmer simply because programming languages are the tools we use, and understanding how your tools work, in any profession, art or science, is the best way to master our craft.

Overview of a language

Here’s a quick overview of how a typical programming language is structured:

alt text

Here’s a walkthrough:

  • The lexer will take your code, split it into tokens and tag them. If your code was in english, that would be like splitting each sentences into words and tagging each word as an adjective, noun or verb.
  • Then the parser takes those tokens and try to derive meaning from it by matching tokens with a set of rules defined in a grammar. This is where we define what constitute an expression, a method call, a local variable and such in our language. The result of that parsing phase is a tree of nodes, called AST for Abstract Syntax Tree.
  • Now, if our language is a tree walker interpreter (like ruby < 1.9), the interpreter will browse the nodes one by one and execute the action associated with each type of node. It’s not very efficient, so that’s why most languages compile to bytecode instead of keeping the nodes in memory and executing from them.
  • Bytecode is very close to machine code, but at the same time close to our language source too. The trick is to bring it as close to the machine as possible to get higher performance and as close to our language as possible to make it easier to compile. Once we have that bytecode, we run it through the virtual machine. It will walk through the bytecode executing actions associated with each byte.
  • While executing, our VM or interpreter will modify the runtime. The runtime is where our program lives. It’s the living world in which our program is executing. When you create objects or call methods, this is all happening in the runtime. Having a fast and memory efficient runtime is crucial. This is also where the garbage collector is doing its work.

How bytecode is actually executed

One of the most fascinating thing I learned was how the bytecode is interpreted by the virtual machine. Even though it’s a Virtual machine, it is very close to how the actual physical machine, the processor, work. So understanding this leads to understanding how your whole machine works!

Here’s a small excerpt from my new class: The Programming Language Masterclass explaining how an if statement is executed at the bytecode level.

A few notes before you watch:

  • The literal table is where we store hard coded values that appear in our code, such as strings, numbers and method names
  • A series of bytes in the bytecode form an instruction, each one telling the VM what to do.

All of this is properly introduced and explained during the class.

I hope this clears up a few things. Leave a comment and let me know!