References Needed for Implementing an Interpreter in C/C++

References Needed for Implementing an Interpreter in C/C++

Short answer:

The fundamental reading list for a lisp interpreter is SICP. I would not at all call it overkill, if you feel you are overqualified for the first parts of the book jump to chapter 4 and start interpreting away (although I feel this would be a loss since chapters 1-3 really are that good!).

Add LISP in Small Pieces (LISP from now on), chapters 1-3. Especially chapter 3 if you need to implement any non-trivial control forms.

See this post by Jens Axel Søgaard on a minimal self-hosting Scheme: http://www.scheme.dk/blog/2006/12/self-evaluating-evaluator.html .

A slightly longer answer:

It is hard to give advice without knowing what you require from your interpreter.

  • does it really really need to be an interpreter, or do you actually need to be able to execute lisp code?
  • does it need to be fast?
  • does it need standards compliance? Common Lisp? R5RS? R6RS? Any SFRIs you need?

If you need anything more fancy than a simple syntax tree walker I would strongly recommend embedding a fast scheme subsystem. Gambit scheme comes to mind: http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php/Main_Page .

If that is not an option chapter 5 in SICP and chapters 5-- in LISP target compilation for faster execution.

For faster interpretation I would take a look at the most recent JavaScript interpreters/compilers. There seem to be a lot of thought going into fast JavaScript execution, and you can probably learn from them. V8 cites two important papers: http://code.google.com/apis/v8/design.html and squirrelfish cites a couple: http://webkit.org/blog/189/announcing-squirrelfish/ .

There is also the canonical scheme papers: http://library.readscheme.org/page1.html for the RABBIT compiler.

If I engage in a bit of premature speculation, memory management might be the tough nut to crack. Nils M Holm has published a book "Scheme 9 from empty space" http://www.t3x.org/s9fes/ which includes a simple stop-the-world mark and sweep garbage collector. Source included.

John Rose (of newer JVM fame) has written a paper on integrating Scheme to C: http://library.readscheme.org/servlets/cite.ss?pattern=AcmDL-Ros-92 .

How would I go about writing an interpreter in C?

A great way to get started writing an interpreter is to write a simple machine simulator. Here's a simple language you can write an interpreter for:

The language has a stack and 6 instructions:

push <num> # push a number on to the stack

pop # pop off the first number on the stack

add # pop off the top 2 items on the stack and push their sum on to the stack. (remember you can add negative numbers, so you have subtraction covered too). You can also get multiplication my creating a loop using some of the other instructions with this one.

ifeq <address> # examine the top of the stack, if it's 0, continue, else, jump to <address> where <address> is a line number

jump <address> # jump to a line number

print # print the value at the top of the stack

dup # push a copy of what's at the top of the stack back onto the stack.

Once you've written a program that can take these instructions and execute them, you've essentially created a very simple stack based virtual machine. Since this is a very low level language, you won't need to understand what an AST is, how to parse a grammar into an AST, and translate it to machine code, etc. That's too complicated for a tutorial project. Start with this, and once you've created this little VM, you can start thinking about how you can translate some common constructs into this machine. e.g. you might want to think about how you might translate a C if/else statement or while loop into this language.

Edit:

From the comments below, it sounds like you need a bit more experience with C before you can tackle this task.

What I would suggest is to first learn about the following topics:

  • scanf, printf, putchar, getchar - basic C IO functions
  • struct - the basic data structure in C
  • malloc - how to allocate memory, and the difference between stack memory and heap memory
  • linked lists - and how to implement a stack, then perhaps a binary tree (you'll need to
    understand structs and malloc first)

Then it'll be good to learn a bit more about the string.h library as well
- strcmp, strdup - a couple useful string functions that will be useful.

In short, C has a much higher learning curve compared to python, just because it's a lower level language and you have to manage your own memory, so it's good to learn a few basic things about C first before trying to write an interpreter, even if you already know how to write one in python.

What do I need to learn to build an interpreter?

Yes, you can use the techniques described in the dragon book to write an interpreter.

You need a lexical analyzer and a parser regardless.

As others have pointed out, you do need to write the code to do actual execution -- but for a simple interpreter, this can be essentially the same as the syntax-directed translation described in the dragon book.

Everything else is optional.


If you want to skip straight from the parser to execution, you can. That will leave you with a very simple language, which can be both good and bad -- look at Tcl for an example of such a language.

If you want to interpret each line as you parse it, you can do that, too; this is what most command-line interpreters (Unix shell scripts, Microsoft's cmd.com and PowerShell) do, as well as interactive "REPL's" (Read-Eval-Print-Loops) for languages like Python and Ruby.

"Semantic analyzer" seems vague to me, but sounds like it should include most kinds of load-time consistency checks. This is also optional, but there are advantages in an interpreter that won't take any old garbage and try to execute it as a program...

"Intermediate code" is also kind of vague, but it is arguably optional. If you aren't executing directly from the program string (as in Tcl), you need some kind of internal representation to store your code once you've read it in. One popular option is to execute from an internal tree structure, based more or less closely on your parse tree, which is arguably distinct from producing "intermediate code". On the other hand, if your "intermediate code" could be written out more or less directly from your internal tree structure, you might as well count the internal structure as your "intermediate code".


There are important issues that you haven't addressed; one that stands out is: how do you want to handle names? Presumably you will want the programmer to be able to define and use his own names (e.g., for variables, functions, and so forth), so you will need to implement some kind of mechanism for that.

Exactly how names are handled is a big design decision, with major implications for the usability and implementability of your language. The simplest option for implementation is to use a single, global hash map to implement a single, global namespace -- but note that this choice has well-known usability problems...

How do interpreters written in C and C++ bind identifiers to C(++) functions

Actually scripting languages do something like what you mentioned.

They wrap functions and they register that functions to the interpreter engine.

Lua sample:

static int io_read (lua_State *L) {
return g_read(L, getiofile(L, IO_INPUT), 1);
}

static int f_read (lua_State *L) {
return g_read(L, tofile(L), 2);
}
...
static const luaL_Reg flib[] = {
{"close", io_close},
{"flush", f_flush},
{"lines", f_lines},
{"read", f_read},
{"seek", f_seek},
{"setvbuf", f_setvbuf},
{"write", f_write},
{"__gc", io_gc},
{"__tostring", io_tostring},
{NULL, NULL}
};
...
luaL_register(L, NULL, flib); /* file methods */

Need some guidance with interpreter creation

Suggestion: Keep it small. Meaning don't try to do everything that mature interpreters do. Creating a full interpreter is a lot of work. Instead focus on a few small topic that interest you. It looks like you are interested in memory management, so play around with stack, heap, and symbol tables.

References:

  • Writing an Interpreter with Lex, Yacc, and Memphis
  • Writing Compilers and Interpreters book
  • there is this question on Stack Overflow
  • How to Write an Interpreter in One Day
  • Writing an Interpreter


Related Topics



Leave a reply



Submit