What Would an Ast (Abstract Syntax Tree) for an Object-Oriented Programming Language Look Like

What would an AST (abstract syntax tree) for an object-oriented programming language look like?

AST is an abstraction of the CST (concrete syntax tree, or, parse tree). The concrete syntax tree is the tree resulting from the productions (in the grammar) used to parse the file. So your AST is basically derived from your grammar definition, but has for transformed

                        Exp                    
/ | \
/ | \ *
Ident BinOp Ident into / \
/ | \ "x" "y"
/ | \
"x" * "y"

All in all I think the example in your post looks fine. I would probably wrap the variable declarations in a varDeclList and the function declaration in a methDeclList, and the return statement in a stmtList. (See below.)

One more or less "real" representation of an AST is described by Apple in his book "Modern Compiler Implementation in Java". (Resources can be found here.)

Using those classes, your program would be represented as follows:

Program
ClassDeclList
ClassDecl
Identifier
id: Person
VarDeclList
VarDecl
type: String
id: name
VarDecl
type: int
id: age
MethDeclList
MethodDecl
modifiers: public
returnType: String
id: toString
Formals
(empty)
StmtList
returnStmt
Identifier
id: name

Clang : What does AST (abstract syntax tree) look like?

There is a small confusion between the various options available:

  • -ast-print will pretty-print the current AST, that is, it will render the code it understood as closely as possible to what it parsed (but making some things explicit, like the apparition of the this)
  • -ast-dump will generate a lisp-like representation of the current AST

The pretty printer can be useful to check that the AST is lossless (ie, preserved the const-ness of such expression, etc...) but is not really about development.

If you want to hack on the compiler, you need -ast-dump, which will generate an output that maps directly the in-memory representation of the code that was parsed.

Is the AST, abstract syntax tree, defined by the language or by the frontend?

Is the AST, abstract syntax tree, defined by the language

Not fully. Each definition in the C++ language standard comes with a short syntax notation and there is a informative annex with grammar summary. But the annex notes https://eel.is/c++draft/gram :

This summary of C++ grammar is intended to be an aid to comprehension.
It is not an exact statement of the language. In particular, the
grammar described here accepts a superset of valid C++ constructs. [...]

There is no VarDecl in that grammar from standard, variable declaration just one interpretation of simple-declaration.

or by the frontend?

Internals of the compiler, if it has a frontend, or it hasn't, it has 3 or 1000 stages, are part of the compiler implementation. From the point of the language, the compiler can be implemented in any way it wants, as long as it translates valid programs correctly. Let's say generally, language specifies what should happen when, not how.

So to answer the question, AST (if used at all in any form) is defined by the compiler.

Who decided that was to be called VarDecl?

I most probably suspect Chris Lattner at https://github.com/llvm/llvm-project/commit/a11999d83a8ed1a2661feb858f0af786f2b829ad .

that came from the mind of the inventor of the language and the various frontends just creates classes named after a document written by him/her or every frontend potentially creates its AST of a given source code and so Clang's and GCC's are different?

Surely they are influenced by what is in the standard, but every compiler has its own internals. Well, in short, Clang VarDecl and GCC VAR_DECL are different, and they are also different conceptually - let's say GCC uses switch(...) case VAR_DECL: and Clang uses classes clang::VarDecl.

Use of the term Abstract Syntax Tree

It's not called an AST because of the inheritance tree. It's called an AST because it's a tree structure and it represents the syntax of JSON.

To see how it is a tree structure, consider the following example:

{"a": 42, "b": {"x": "y"}}

This will be parsed to the following object:

JsObject(Map(
"a" -> JsInt(42),
"b" -> JsObject(Map(
"x" -> JsString("y")
))
))

Which represents the following tree:

         Object
"a" / \ "b"
Int Object
| | "x"
42 String
|
"y"

This is an abstract syntax tree of the above JSON string.

I know that the authors are not using the term in the context of the syntax analysis of a compiler

Not of a compiler, but they are using it in the context of analyzing the abstract syntax of JSON.



Related Topics



Leave a reply



Submit