How to Write an Interpreter

How would I go about writing an interpreter in C?

A great way to get started writing an interpreter is to write a simple machine simulator. Here's a simple language you can write an interpreter for:

The language has a stack and 6 instructions:

push <num> # push a number on to the stack

pop # pop off the first number on the stack

add # pop off the top 2 items on the stack and push their sum on to the stack. (remember you can add negative numbers, so you have subtraction covered too). You can also get multiplication my creating a loop using some of the other instructions with this one.

ifeq <address> # examine the top of the stack, if it's 0, continue, else, jump to <address> where <address> is a line number

jump <address> # jump to a line number

print # print the value at the top of the stack

dup # push a copy of what's at the top of the stack back onto the stack.

Once you've written a program that can take these instructions and execute them, you've essentially created a very simple stack based virtual machine. Since this is a very low level language, you won't need to understand what an AST is, how to parse a grammar into an AST, and translate it to machine code, etc. That's too complicated for a tutorial project. Start with this, and once you've created this little VM, you can start thinking about how you can translate some common constructs into this machine. e.g. you might want to think about how you might translate a C if/else statement or while loop into this language.

Edit:

From the comments below, it sounds like you need a bit more experience with C before you can tackle this task.

What I would suggest is to first learn about the following topics:

scanf, printf, putchar, getchar - basic C IO functions
struct - the basic data structure in C
malloc - how to allocate memory, and the difference between stack memory and heap memory
linked lists - and how to implement a stack, then perhaps a binary tree (you'll need to
understand structs and malloc first)

Then it'll be good to learn a bit more about the string.h library as well
- strcmp, strdup - a couple useful string functions that will be useful.

In short, C has a much higher learning curve compared to python, just because it's a lower level language and you have to manage your own memory, so it's good to learn a few basic things about C first before trying to write an interpreter, even if you already know how to write one in python.

How to write an interpreter?

You will have to learn at least:

lexical analysis (grouping characters into tokens)
parsing (grouping tokens together into structure)
abstract syntax trees (representing program structure in a data structure)
data representation (assuming your language will have variables)
an evaluation loop that "runs" your program

An excellent introduction to some of these topics can be found in the introductory text Structure and Interpretation of Computer Programs. The language used in that book is Scheme, which is a robust, well-specified language that is ideally suited for your first interpreter implementation. Highly recommended.

What do I need to learn to build an interpreter?

Yes, you can use the techniques described in the dragon book to write an interpreter.

You need a lexical analyzer and a parser regardless.

As others have pointed out, you do need to write the code to do actual execution -- but for a simple interpreter, this can be essentially the same as the syntax-directed translation described in the dragon book.

Everything else is optional.

If you want to skip straight from the parser to execution, you can. That will leave you with a very simple language, which can be both good and bad -- look at Tcl for an example of such a language.

If you want to interpret each line as you parse it, you can do that, too; this is what most command-line interpreters (Unix shell scripts, Microsoft's cmd.com and PowerShell) do, as well as interactive "REPL's" (Read-Eval-Print-Loops) for languages like Python and Ruby.

"Semantic analyzer" seems vague to me, but sounds like it should include most kinds of load-time consistency checks. This is also optional, but there are advantages in an interpreter that won't take any old garbage and try to execute it as a program...

"Intermediate code" is also kind of vague, but it is arguably optional. If you aren't executing directly from the program string (as in Tcl), you need some kind of internal representation to store your code once you've read it in. One popular option is to execute from an internal tree structure, based more or less closely on your parse tree, which is arguably distinct from producing "intermediate code". On the other hand, if your "intermediate code" could be written out more or less directly from your internal tree structure, you might as well count the internal structure as your "intermediate code".

There are important issues that you haven't addressed; one that stands out is: how do you want to handle names? Presumably you will want the programmer to be able to define and use his own names (e.g., for variables, functions, and so forth), so you will need to implement some kind of mechanism for that.

Exactly how names are handled is a big design decision, with major implications for the usability and implementability of your language. The simplest option for implementation is to use a single, global hash map to implement a single, global namespace -- but note that this choice has well-known usability problems...

Writing an interpreter in OCaml

May be you should try Types and Programming Languages by B. Pierce which is probably a good reference on the topic (especially if your input language is functional).

If you want to go deeper into semantics I would advice Formal Semantics of Programming Languages by G. Winskel. But it won't help you with implementation.

Finally, if at one point you need to compile your language instead of interpreting it you should have a look at Modern Compiler Implementation in ML by A. Appel.

But these books are probably overkills for what you want to do (your lectures notes should probably be enough).

How can I write a quick and dirty interpreter?

If you want to write a a very simple interpretive language, you should take a look at FORTH. The lexer is trivial (tokens are space separated) and the interpreter is also very simple. And if FORTH is too retro, take a look at Scheme - you can build a tiny Scheme interpreter very quickly.

How would I write an interpreter for this in javascript?

OK, I'll actually try to tackle this question somewhat... although there is no way I could possibly distill everything you need to know into a few sentences or even paragraphs.

First, you should gain an understanding / familiarity with what's involved in building a compiler. You say you want to "interpret" the code - but, I think what you really want is to compile the code to Javascript (and in Javascript as well).

Wikipedia has a great page on the topic: http://en.wikipedia.org/wiki/Compiler

The gist of the thing is:

1.) Convert the text (source code) into some sort of in-memory data structure (abstract syntax tree - AST) that actually lets you reason about the structure of the program you've been given.

2.) Given that structure, produce your output (Javascript, in this case).

To break down step 1 a bit further - Define your grammar e.g.; what is valid syntax in this new language of yours, and what is not? Typically, it's best to reason about this sort of thing with BNF on paper (or whatever syntax the tools you use prefer - although (E)BNF is the standard). The challenging part about this step is not only doing the grunt work of parsing the source code - but also making sure you've come up with a grammar that is unambiguous and readily parsable. Those two requirements are actually somewhat more difficult to nail down than you might think.

I've built an LALR parser generator in C# - and, I can tell you, unless you've built one before, it's not a trivial task. Beyond that, there are so many good ones, that, unless you are really wanting to know how it works for the fun of it or because you're into that kind of thing, it makes a whole lot more sense to use a parser-generator someone else wrote. The great thing about a parser generator is that it will take that syntax definition you've come up with convert it into a program that will spit out an AST the other end. That's a HUGE amount of work that was just done for you. And, in fact, there are a few for Javascript:

http://www.google.com/search?q=javascript+parser+generator

PEG.js – Parser Generator for JavaScript

JS/CC Parser Generator Project Homepage

On to step 2. This step can be very basic for something like infix expressions - or it can get very complex. But, the idea is, given the AST, "convert" it into your output format (Javascript). Typically you need to check for things that aren't checked for by the "simple" syntax checking that occurs in the parser. For example, even in your sample code there is a whole number of things that could possibly go wrong. In the part where you say plus x what would happen if the developer never defined x? Should this be an error? Should x default to some value? This is where your language really comes to life. And, to back-track for a minute - your time needs to be spent on this step - not on the parser. Use a tool for that - seriously. You're talking about starting a large and challenging project - don't make it even harder for yourself. To add to all this - there is often a need to make multiple "passes" through the AST. For example, the first pass may look for and setup "module" definitions, the second pass may look for and setup "namespaces", another pass may setup classes, etc. These further refinements of the structure of the final application are used in later steps to determine if a reference to a particular class/variable/module/etc is valid (it actually exists or can be referenced).

There are a few really great books on compilers. The infamous "dragon book" is one.

Creating a language interpreter

Compiler creation is a complex topic (an interpreter can be seen as a special compiler).

You have to first parse it - try to understand the syntax and then create some internal representation (abstract syntax tree) and then create some execution logic.

Wikpedia suggests http://mcs.une.edu.au/~comp319/